Collections

A collection is a set of Rockset documents. All documents within a collection and all fields within a document are mutable. Like tables in traditional SQL databases, collections are traditionally queried with FROM clauses in SQL queries.

#Creating Collections

Collections can be created from external data sources after setting up its respective integration listed in the Data Sources section. They can also be created without an external integration by using the Rockset Console (either empty or from a file upload) or by using the REST API directly. You can read more about how to create a collection from a self-managed data source here.

We generally recommend mapping each data source (e.g. MongoDB collection, DynamoDB table, Kafka topic) to a single collection, and joining those collections at query time using JOIN when necessary.

#Field Mappings

Collections automatically create indexes on every field path that is included in that collection, including those of nested paths.

To create an index on a compound or derived field, you may use a field mapping. Field Mapping provide the means to create new fields by applying SQL expressions on fields of incoming documents.

#Retention Duration

For each collection, you can set a custom retention duration. This determines how long each document is retained after being added to the collection. Low retention values can be used to keep total data indexed low, while still ensuring your applications are querying the latest data.

#Updating Collections

When a collection is created from a managed integration, Rockset will automatically sync your collection to remain up-to-date with its data source, usually within a matter of seconds (you can read more about individual source behavior in the Data Sources section).

You can also manually update your collections by using the Documents API or by manually uploading files in the Rockset Console.

#Using Collections

Collections can be queried using SQL the same way tables are queried in traditional SQL databases. Queries can be run against collections through our REST API and SDKs (Node.js, Python, Java, or Golang) to run queries against collections. SQL queries can also JOIN documents across different Rockset collections.

#Aliases

An alias references a collection. You can use the alias name in your queries instead of the actual collection name. You can switch the alias to a different collection without any downtime for your queries.

This is useful for versioning your data during bulk refresh of a collection. You can create a new collection for every bulk refresh and switch the alias to the latest one without any downtime for your queries. In case of any issues during bulk refresh, you can switch the alias back to the previous collection.

You can manage your aliases in the Rockset Console, or by using the REST API directly.

Aliases

#Monitoring Collections

You can view collection statuses (and any data processing errors) for each collection in the Collections tab of the Rockset Console.

Join us on Slack!
Building on Rockset? Come chat with us!