- Loading Your Data
- Data Sources
Rockset has native integrations pre-built for connecting external data sources such as DynamoDB, MongoDB, Kinesis, S3, Google Cloud Storage and more to your Rockset account. Rockset will then automatically manage and sync your collections to remain up-to-date with their respective data sources, usually within a matter of seconds.
Managed integrations for the following data sources are currently supported:
Note that using an integration is optional – if you prefer to insert and sync your data manually, or if your desired data source is not currently supported, you can always use the Rockset API to create and update your collections. You can read more about using the Rockset API to create self-managed data sources here.
Integrations can be created by admins in your Rockset organization. They are created by using the Rockset Console or by using the Rockset API directly. Setup time generally takes around 10-15 minutes. Step-by-step instructions for each integration can be found under the documentation for each data source.
You can read about the permissions Rockset requires and why Rockset requires them for each integration type in the Data Sources section. You can also read about these permissions in the Rockset Console during integration creation.
Since many integrations require advanced permissions and multi-step processes, we generally recommend setting these up in the Rockset Console for full context.
Once an integration is set up, it can be used to create any number of collections. For each integration, you can see a list of each collections backed by that integration in the Rockset Console.
We generally recommend mapping each data source (e.g. MongoDB collection, DynamoDB table, Kafka
topic, etc.) to a single collection, and joining those collections at query time using
#Additional Syncing Costs
Note that depending on the data source, additional costs may be potentially incurred by your data source provider from frequent read requests sent by Rockset to keep your data up-to-date in real time (e.g. AWS charging you for DynamoDB stream read requests). This cost generally remains very small (usually no more than a few USD per month) and does not grow exponentially even as your data size scales.