This page describes fields with unique roles and behaviors in Rockset documents.

## Overview

Special fields are prefixed with an underscore and have important effects on the ingestion and querying behavior of documents in your [<<glossary:Collections>>.](🔗) Some are automatically generated by Rockset during data ingestion, while others can be specified from a source document or [<<glossary:Ingest Transformation>>](🔗).

Special fields are immutable.

Users should only specify their own special fields if they are certain they will not need to be updated.

You can execute the following query on any collection to view some of its special fields:

(Script tags will be stripped)

Below is an exhaustive list of all special fields in Rockset.

## The `_id` field

Every document in a Rockset collection is uniquely identified by its `_id` field.

  • If the document source does not already have an `_id` field, Rockset populates it with an automatically generated [uuid](🔗).

  • If the document source has `_id` specified, or an ingest transformation outputs an `_id` field, its value is preserved. A newly ingested document will overwrite any existing document with the same `_id` value.

For collections with [<<glossary:Rollups>>](🔗), the `_id` field is populated by Rockset and cannot be specified by the user.

## The `_meta` field

Metadata regarding each document is stored in a `_meta` field of object type.

If the source of a document specifies a `_meta` field, Rockset will ignore the field. Currently, `_meta` holds information about the source from which the document was inserted into the collection (such as the bucket name and path in case of S3). If Rockset is unable to parse the source of a document, it will create a document without any of the source's fields and will have `_meta` with a nested field named `bad`.

The `_meta` field is never populated for collections with rollups.

## The `_event_time` field

Rockset associates a timestamp with each document in a field named `_event_time`, recorded as microseconds since the Unix epoch. By default, `_event_time` is set as the time a document is inserted into a Rockset collection.

Users can specify their own `_event_time` by including the field in their source records, or defining an ingest transformation with a mapping for `_event_time`. User-specified `_event_time` values must be of either int (microseconds since epoch) or timestamp type, otherwise the ingestion of the document will fail.

Rockset's time-based retention feature uses `_event_time` to determine when a document has fallen outside the retention window and should be removed from a collection. Sometimes using the default document insertion time for retention makes perfect sense, but many use cases may want to trim records according to something else, in which case they need to define their own `_event_time`.

If your collection has rollups and your rollup query does not contain an `_event_time` mapping, this field is populated with the initial insertion time of the rolled up document. It does not change as more input documents are aggregated into the rolled up document.

## The `_stored` field

`_stored` reduces the hot storage size of your collections by excluding data from certain indexes. More specifically, we exclude `_stored` and its children from the inverted and range indexes, but we still include them in our columnar and row indexes. You can explore the [Rockset Storage Architecture](🔗) to learn more about Rockset's [Converged Indexing](🔗).

Leveraging `_stored` can significantly reduce storage sizes by lowering the storage amplification associated with indexes. Though, the reduction depends on data distributions. For certain data distributions, the sizes of the columnar and row indexes can greatly exceed the sizes of the inverted and range indexes, limiting the relative impact of `_stored`. We see this pattern with large text fields, since the inverted index only tracks prefixes of these fields and not the entire fields. Using `_stored` with large text fields will have a limited impact on the overall storage sizes.

You must configure `_stored` in your ingest transformation.

Special Field Tip

We recommend using `_stored` as an object, so you can consistently store and reference multiple fields. Though, you can still use `_stored` as a scalar or array.

The following examples outline how to use `_stored` in your ingest transformations.

Queries with predicates on `_stored` must use the columnar index during execution.

This is because we exclude `_stored` and its subfields from the inverted index. Thus, you should not include fields in `_stored` on which you expect to apply selective filters, as the associated queries will run much more efficiently with those fields in the inverted index. You can still efficiently project fields from `_stored` after applying selective filters on other fields in your collections.

## The `_op` field

The `_op` field enables flexibly ingesting records into a Rockset collection. Each document ingested into Rockset may have an optional `_op` field that will affect its ingestion behavior. The value of `_op` can come directly from a source document, or from an ingest transformation. Unlike other special fields, `_op` is purely an ingest-time concept that does not materialize in a Rockset collection and consequently can't be queried.

Here are the supported `_op` values (case insensitive). If no `_op` value is explicitly included in the document, the default operation is `UPSERT`. Any value other than the supported ones below will lead to an ingestion error for the document.

  • `INSERT`– If no document exists with the same `_id`, insert this document. If another document with this `_id` exists, do nothing. `_id` is optional for this operation and will be automatically generated if not specified.

  • `UPDATE`– If a document exists with the same `_id`, overwrite the top-level fields present in this document, leaving all other fields in the existing document unchanged. If no document exists with the same `_id`, do nothing. `_id` is required for this operation and ingestion will error if it is not specified. Special field `_event_time` cannot be changed and will be ignored if specified.

  • `UPSERT`– If a document exists with the same `_id`, do an `UPDATE`. If it does not exist, do an `INSERT`.`_id` is optional for this operation and will be automatically generated if not specified. This is the default behavior if no `_op` value is specified.

  • `DELETE`– Delete the document with this `_id` if it exists. If no such document exists, do nothing. `_id` is required for this operation and ingestion will error if it is not specified.

  • `REPLACE`– If a document exists with the same `_id`, delete the entire existing document and insert this one instead. If no such document exists, do nothing. `_id` is required for this operation and ingestion will error if it is not specified. Unlike `UPDATE` this will change the `_event_time` of the document.

  • `REPSERT`– If a document exists with the same `_id`, do a `REPLACE`. If no such document exists, do an `INSERT`. `_id` is required for this operation and ingestion will error if it is not specified.

For implementation examples, refer to this `_op` example in an ingest transformation and `_op` example in an [`INSERT INTO`](🔗) statement.

Not all collections support `_op`. Namely, it is not supported for:

  • Rollup collections

  • Managed sources which have their own semantics for sending deletes (MongoDB and DynamoDB)

Creating a collection with one of these unsupported configurations with a mapping for `_op` will lead to an error at collection creation time. If a record being ingested into a collection with an unsupported configuration contains `_op` from the source, the document will error during ingestion.

To illustrate the behavior of \_op, here are some sample documents with various `_op` types explaining the behavior of each as they are applied sequentially on top of an empty collection.

## The `_seq_no` field

The `_seq_no` field ensures data consistency and prevents conflicting updates by persisting the most up-to-date version of a document.

By default, when the `_seq_no` field is not set in a document, Rockset follows the last-write-wins strategy. However, there are scenarios when the application records are written to Rockset out-of-order in which case the last-write-wins policy of Rockset might not be what the user wants to achieve.

To address this, you can set your document `_seq_no` field to an ever increasing value, incremented on every change. When a document with the same `_id` already exists in the collection, Rockset will replace the existing document with the update only if the update has a higher `_seq_no` value. Otherwise, the update is ignored to prevent overwriting more recent modifications.

`_seq_no` value must be an integer, and can be set in the the source document or through ingest transformation.

To illustrate the behavior of `_seq_no` and its interaction with `_op`, here are some sample updates explaining behavior as they are applied sequentially on top of an empty collection.