Data Sources

In order to add data to Rockset, you will need to:

  1. Add a new data source so that Rockset has the permissions and settings to connect to your data.
  2. Create a Collection with the data.

Rockset allows users to connect:

  • Data streams (Kafka, Kinesis)
  • OLTP databases (DynamoDB, MongoDB, MySQL, PostgreSQL)
  • Data lakes (S3, GCS)

As new data shows up in your data source, it will get indexed within seconds into Rockset. Rockset ingests your data without needing a schema ahead of time, so you can get set up quickly.

Fully-managed integrations for the following data sources are currently supported, meaning that Rockset will automatically detect changes to your data source in real-time, replicating those changes into Rockset within seconds:

Partially-managed integrations for the following data sources are currently supported, meaning that you must periodically export changes to your data source in batch to a fully-managed data source (such as Amazon S3), where Rockset will then replicate those changes within seconds:

💡

Using an integration is optional. If you prefer to insert and sync your data manually, or if your desired data source is not currently supported, you can always use the Rockset API to create and update your collections. There is more information about using the Rockset API to create self-managed data sources.

Creating Integrations

Integrations can be created by admins in your Rockset organization. They are created by using the Rockset Console or by using the Rockset API directly. Setup time generally takes around 10-15 minutes. Step-by-step instructions for each integration can be found under the documentation for each data source.

Data Permissions

You can read about the permissions Rockset requires and why Rockset requires them for each integration type in the Data Sources section. You can also read about these permissions in the Rockset Console during integration creation.

Since many integrations require advanced permissions and multi-step processes, we generally recommend setting these up in the Rockset Console for full context.

Using Integrations

Once an integration is set up, it can be used to create any number of collections. For each integration, you can see a list of each collections backed by that integration in the Integration page of the Rockset Console.

We generally recommend mapping each data source (such as MongoDB collection, DynamoDB table, Kafka topic) to a single collection, and joining those collections at query time using JOIN when necessary.

Additional Syncing Costs

Depending on the data source, additional costs may potentially be incurred by your data source provider from frequent read requests sent by Rockset to keep your data current in real time, such as AWS charging you for DynamoDB stream read requests. This cost generally remains very small--no more than a few US dollars (USD) per month--and does not grow exponentially even as your data size scales.

Export Sample Data

If you are not ready to connect your full data source yet, you can easily export sample data to a CSV file and upload it to Rockset instead.