- Loading Your Data
- Collections
Collections
A collection is a set of Rockset documents. All documents within a collection and all fields within
a document are mutable. Similar to tables in traditional SQL databases, collections are
traditionally queried with FROM
clauses in SQL queries.
#Creating Collections
Collections can be created in the Collections tab of the Rockset Console or by using the Rockset API.
To create a collection using a managed integration with an external data source (such as MongoDB or
Amazon S3), you will first have to set up the respective integration by following instructions in
the Integrations section. We generally recommend mapping each data source (e.g.
MongoDB collection, DynamoDB table, Kafka topic) to a single collection, and joining those
collections at query time using JOIN
when necessary.
Note that using an external data source is not required. You can always create collections using the Rockset Console (either empty or from a file upload) or by using the Rockset API directly without referencing any sources. You can read more about how to create a collection from a self-managed data source here.
#Bulk Ingest Mode
BULK_INGEST_MODE
is a feature available only to collections created using one of the following
managed data sources: Amazon DynamoDB, MongoDB Atlas,
Amazon S3 and Google Cloud Storage. When a collection is
first created using one of the those data sources, Rockset will scan your data source to see if it
exceeds the minimum size of 5 GiB required to enter BULK_INGEST_MODE
as soon as the collection
is initialized. If so, Rockset will change the collection status to BULK_INGEST_MODE
, which will
prevent the collection from being queried, but will allow ingest to occur at speeds several orders
of magnitude higher than the typical streaming ingest.
Once the bulk ingest is completed, the collection will then enter READY
state, at which point you
can begin executing queries. Rockset will continue to scan your external data source actively, but
any new documents will be added using the normal streaming ingest. A collection will only ever
BULK_INGEST_MODE
once, and it will only happen immediately following its creation.
If Rockset determines that your data source does not meet the requirements to enter
BULK_INGEST_MODE
, the collection will immediately enter READY
state, and documents will be added
using the normal streaming ingest. You may execute queries as the initial ingest is still occuring,
but some of the data may be unavailable until the initial ingest is fully completed.
#Field Mappings
Collections automatically create indexes on every field path that is included in that collection, including those of nested paths.
To transform incoming data by creating an index on a compound or derived field, configure and create field mappings. Field Mappings provide the means to create new fields by applying SQL expressions on fields of incoming documents.
#Retention Duration
For each collection, you can set a custom retention duration. This determines how long each document is retained after being added to the collection. Low retention values can be used to keep total data indexed low, while still ensuring your applications are querying the latest data.
#Special Fields
Every document ingested into Rockset collections has several system-generated fields which are automatically created by Rockset and added to each document. Learn more about special fields here.
#The _events
Collection
When your organization is created, Rockset will automatically create a collection named _events
which is used for audit logging to provide visibility into your account. Learn more about the
_events
collection here.
#Updating Collections
When a collection is created from a managed integration, Rockset will automatically sync your collection to remain up-to-date with its data source, usually within a matter of seconds (you can read more about individual source behavior in the Data Sources section).
If you choose not to create your collection using a managed integration, or wish to make manual changes to data in your collection after Rockset has synced it with your external data source, you can learn more about manually adding, deleting, or patching documents here.
#Querying Collections
Collections can be queried using SQL the same way tables are queried in traditional SQL databases.
You can write and execute queries on your collections in the Query Editor tab of the Rockset Console. Queries can also be executed using the Rockset API and SDKs (Node.js, Python, Java, or Go) to run queries against collections. SQL queries can also JOIN documents across different Rockset collections and workspaces.
See our SQL Reference for the full list of supported data types, commands, functions, and more.