Special Fields
This page describes fields with unique roles and behaviors in Rockset documents.
Overview
Special fields are prefixed with an underscore and have important effects on the ingestion and querying behavior of documents in your Collections. Some are automatically generated by Rockset during data ingestion, while others can be specified from a source document or Ingest Transformation.
You can execute the following query on any collection to view some of its special fields:
SELECT _id, _meta, _event_time FROM mycollection LIMIT 5;
Below is an exhaustive list of all special fields in Rockset.
The _id
field
_id
fieldEvery document in a Rockset collection is uniquely identified by its _id
field.
- If the document source does not already have an
_id
field, Rockset populates it with an automatically generated uuid. - If the document source has
_id
specified, or an ingest transformation outputs an_id
field, its value is preserved. A newly ingested document will overwrite any existing document with the same_id
value.
For collections with Rollups, the
_id
field is populated by Rockset and cannot be specified by the user.
The _meta
field
_meta
fieldMetadata regarding each document is stored in a _meta
field of object type.
If the source of a document specifies a _meta
field, Rockset will ignore the field. Currently, _meta
holds information about the source from which the document was inserted into the collection (such as the bucket name and path in case of S3). If Rockset is unable to parse the source of a document, it will create a document without any of the source's fields and will have _meta
with a nested field named bad
.
The
_meta
field is never populated for collections with rollups.
The
_meta
field is immutable.The
_meta
special field is immutable. The value of this field can only be updated by usingREPLACE
orREPSERT
operations. See The_op
field section for more information.
The _event_time
field
_event_time
fieldRockset associates a timestamp with each document in a field named _event_time
, recorded as microseconds since the Unix epoch. By default, _event_time
is set as the time a document is inserted into a Rockset collection.
Users can specify their own _event_time
by including the field in their source records, or defining an ingest transformation with a mapping for _event_time
. User-specified _event_time
values must be of either INT (microseconds since epoch), TIMESTAMP or ISO8601 STRING type (with valid UTC timezone), otherwise the ingestion of the document will fail.
Rockset's time-based retention feature uses _event_time
to determine when a document has fallen outside the retention window and should be removed from a collection. Sometimes using the default document insertion time for retention makes perfect sense, but many use cases may want to trim records according to something else, in which case they need to define their own _event_time
.
If your collection has rollups and your rollup query does not contain an _event_time
mapping, this field is populated with the initial insertion time of the rolled up document. It does not change as more input documents are aggregated into the rolled up document.
The
_event_time
field is immutable.The
_event_time
special field is immutable. The value of this field can only be updated by usingREPLACE
orREPSERT
operations. See The_op
field section for more information.
The _stored
field
_stored
field_stored
reduces the hot storage size of your collections by excluding data from certain indexes. More specifically, we exclude _stored
and its children from the inverted and range indexes, but we still include them in our columnar and row indexes. You can explore the Rockset Storage Architecture to learn more about Rockset's Converged Indexing.
Leveraging _stored
can significantly reduce storage sizes by lowering the storage amplification associated with indexes. Though, the reduction depends on data distributions. For certain data distributions, the sizes of the columnar and row indexes can greatly exceed the sizes of the inverted and range indexes, limiting the relative impact of _stored
. We see this pattern with large text fields, since the inverted index only tracks prefixes of these fields and not the entire fields. Using _stored
with large text fields will have a limited impact on the overall storage sizes.
You must configure _stored
in your ingest transformation.
Special Field Tip
We recommend using
_stored
as an object, so you can consistently store and reference multiple fields. Though, you can still use_stored
as a scalar or array.
The following examples outline how to use _stored
in your ingest transformations.
SELECT foo, {'bar': bar} _stored FROM _input
SELECT * EXCEPT(bar), {'bar': bar} _stored FROM _input
Queries with predicates on
_stored
must use the columnar index during execution.This is because we exclude
_stored
and its subfields from the inverted index. Thus, you should not include fields in_stored
on which you expect to apply selective filters, as the associated queries will run much more efficiently with those fields in the inverted index. You can still efficiently project fields from_stored
after applying selective filters on other fields in your collections.
The _op
field
_op
fieldThe _op
field enables flexibly ingesting records into a Rockset collection. Each document ingested into Rockset may have an optional _op
field that will affect its ingestion behavior. The value of _op
can come directly from a source document, or from an ingest transformation. Unlike other special fields, _op
is purely an ingest-time concept that does not materialize in a Rockset collection and consequently can't be queried.
Here are the supported _op
values (case insensitive). If no _op
value is explicitly included in the document, the default operation is UPSERT
. Any value other than the supported ones below will lead to an ingestion error for the document.
INSERT
– If no document exists with the same_id
, insert this document. If another document with this_id
exists, do nothing._id
is optional for this operation and will be automatically generated if not specified.UPDATE
– If a document exists with the same_id
, overwrite top-level fields present in the new document, while leaving all other fields in the existing document unchanged. If no document exists with the same_id
, do nothing._id
is required for this operation and ingestion will error if it is not specified.Special fields
_event_time
and_meta
will not be updated with anUPDATE
operation.Values for the
_event_time
and [_meta
](# the-_meta-field) fields will not be updated when using anUPDATE
operation. If the_event_time
value is specified in this document, it will be ignored.If you would like to update either of these fields, use the
REPLACE
orREPSERT
operations described below.UPSERT
– If a document exists with the same_id
, perform anUPDATE
operation. If it does not exist, perform anINSERT
operation._id
is optional for this operation and will be automatically generated if not specified. Note: This is the default behavior if no_op
value is specified.DELETE
– Delete the document with this_id
if it exists. If no such document exists, do nothing._id
is required for this operation and ingestion will error if it is not specified.REPLACE
– If a document exists with the same_id
, delete the entire existing document and insert this one instead. If no such document exists, do nothing._id
is required for this operation and ingestion will error if it is not specified.All special fields will be updated with a
REPLACE
operation, including_event_time
and_meta
.Unlike an
UPDATE
operation, the values of_event_time
and_meta
will be changed when performing aREPLACE
operation.- If an
_event_time
value is specified in this document, that value will be used. If no_event_time
value is specified, the time of theREPLACE
operation will be used as the new value. - The value for
_meta
will be defined as described in The_meta
field section based on the contents of this document.
- If an
REPSERT
– If a document exists with the same_id
, perform aREPLACE
operation. If no such document exists, perform anINSERT
operation._id
is required for this operation and ingestion will error if it is not specified.
For implementation examples, refer to this _op
example in an ingest transformation and _op
example in an INSERT INTO
statement.
Not all collections support _op
. Namely, it is not supported for:
- Rollup collections
- Managed sources which have their own semantics for sending deletes (MongoDB and DynamoDB)
Creating a collection with one of these unsupported configurations with a mapping for _op
will lead to an error at collection creation time. If a record being ingested into a collection with an unsupported configuration contains _op
from the source, the document will error during ingestion.
To illustrate the behavior of _op, here are some sample documents with various _op
types explaining the behavior of each as they are applied sequentially on top of an empty collection.
{"_op": "INSERT", "_id": "abc", "x": 1, "y": "bar"} // _id="abc" is inserted with x=1, y="bar"
{"_op": "INSERT", "_id": "abc", "x": 2, "y": "baz"} // this is a no-op since _id="abc" already exists
{"_op": "UPDATE", "_id": "abc", "x": 2} // now x=2 for _id="abc" and y="bar" (unchanged)
{"_op": "UPDATE", "_id": "def", "x": 3} // this is a no-op since _id="def" does not exist
{"_op": "UPSERT", "_id": "def", "x": 3} // _id="def" is inserted with x=3
{"_op": "UPSERT", "_id": "def", "y": "baz"} // now y="baz" for _id="def" and x=3 (unchanged)
{"_op": "DELETE", "_id": "xyz"} // this is a no-op since _id="xyz" does not exist
{"_op": "DELETE", "_id": "def"} // this deletes _id="def" leaving only _id="abc" in the collection
{"_op": "REPLACE", "_id": "abc", "z": 4} // now _id="abc" is exactly {"z": 4} and fields x and y have been removed
{"_op": "REPLACE", "_id": "ghi", "z": 5} // this is a no-op since _id="ghi" does not exist
{"_op": "REPSERT", "_id": "ghi", "z": 6} // _id="ghi" is inserted with z=6
{"_op": "REPSERT", "_id": "ghi", "y": 7} // now _id="ghi" is exactly {"y": 7} and field z has been deleted
The _seq_no
field
_seq_no
fieldThe _seq_no
field ensures data consistency and prevents conflicting updates by persisting the most up-to-date version of a document.
By default, when the _seq_no
field is not set in a document, Rockset follows the last-write-wins strategy. However, there are scenarios when the application records are written to Rockset out-of-order in which case the last-write-wins policy of Rockset might not be what the user wants to achieve.
To address this, you can set your document _seq_no
field to an ever increasing value, incremented on every change. When a document with the same _id
already exists in the collection, Rockset will replace the existing document with the update only if the update has a higher _seq_no
value. Otherwise, the update is ignored to prevent overwriting more recent modifications.
_seq_no
value must be an integer, and can be set in the the source document or through ingest transformation.
To illustrate the behavior of _seq_no
and its interaction with _op
, here are some sample updates explaining behavior as they are applied sequentially on top of an empty collection. Please note that if no _op
value is specified, the default operation will be an UPSERT.
{"_id": "abc", "_seq_no": 1, "x": 1, "y": "bar"} // document is inserted
{"_id": "abc", "_seq_no": 3, "x": 8, "z": "foo"} // now x=8, z="foo", and y is unchanged; the default UPSERT logic when _op is not specified keeps y unchanged
{"_id": "abc", "_seq_no": 2, "x": 5} // no-op since _seq_no is less than previous update
{"_id": "abc", "_seq_no": 4, "x": 9, "_op": "REPSERT"} // now x=9 and y and z are dropped as REPSERT logic replaces the entire document
Updated 8 months ago