Updating Collections
The capabilities associated with updating Collections are only available through the Rockset API. These actions cannot currently be done through the Rockset console.
Updating an Ingest Transformation
You can update the Ingest Transformation for a collection at any time using the Update Collection API Endpoint or in the Rockset console. To do this within the console, navigate to the collection details page of the collection you wish to modify. Click the Ingest Transformation tab, and click the "Update" button to begin the Update Ingest Transformation flow. Alternatively, you can open the overflow menu in the top right corner of the screen and click 'Update Ingest Transformation'.
When you update the transformation, you may see a brief pause in ingestion for sources associated with the collection. It usually only takes a few seconds for the updated transformation to take effect, but with some sources it can take up to a few minutes.
When updating an ingest transformation, only new documents being ingested from your source will have the new ingest transformation applied. The updated ingest transformation will not affect previously ingested documents. There is also no built-in versioning with the ingest transformation to track which documents of a collection had which transformation applied at ingest time. However you can add this in explicitly as part of your transformation. See the following example:
SELECT
*,
1 AS version
FROM _input
There are some restrictions when updating a transformation:
- You cannot update ingest transformations with Rollups configured.
- You cannot add rollups to an existing ingest transformation.
- You cannot modify the clustering of an ingest transformation.
- Updating source ingest transformations is currently not supported.
Otherwise you are free to add/remove/change field projections and names, add/remove where clause predicates, etc. To remove the ingest transformation for a collection all-together, update the transformation to the default SELECT * FROM _input
.
Adding and Removing Sources
Rockset has the ability to ingest data from multiple sources into the same collection. Rockset can ingest data from sources of the same type and different types.
To add new sources to an existing collection, you can use the Create Source API Endpoint or the Rockset console. To do this within the console, navigate to the collection details page of the collection you wish to modify. Open the overflow menu in the top right corner of the screen and click 'Add Source'.
To remove sources from an existing collection, you can use the Delete Source API Endpoint. You can also do so in the console by going to the Collection Details page, clicking the Sources tab, selecting the 3 dots on the right side of the source you want to remove, and clicking "Remove Source". When removing a source from a given collection, the collection will retain previously ingested data from the source, but the collection will no longer fetch updates from the removed source.
With the Create Source and Delete Source API Endpoints, you can perform advanced source-level operations. As an example, you could initially ingest data from an S3 source and then switch to ingesting data from a Kinesis source once the initial load completes.
There are some limitations when adding sources to an existing collection:
- Collections can only enter Bulk Ingest mode directly after collection creation. If a source added after collection creation contains large volumes of data, standard streaming ingest limits apply.
- Collections configured with rollups can only contain sources that support rollups.
- If multiple sources point to the same data, the data will be ingested multiple times unless you include additional logic in the ingest transformation.
Temporarily Suspending Ingest
In certain situations, you may need to temporarily suspend data ingestion for specific sources and collections. Suspending ingest can assist with troubleshooting source-related issues and managing sources when they undergo maintenance. Rockset provides Suspend and Resume API Endpoints to help manage ingestion from your sources.
Source suspension should be used sparingly and only for short periods of time. After resuming a previously suspended source, Rockset will ingest all data generated during the suspension period.
Exceeding source-related data retention policies during the suspension period can lead to permanent error states for your collections. Certain sources like AWS Kinesis and DynamoDB limit their data retention periods to 24 hours, and therefore, suspending these sources for more than 24 hours can put the associated collection in a permanent error state. Suspending a source can lead to an accumulation of data, and therefore, collections may take additional time to catch up once resumed.
Suspending a source during bulk ingest mode will not cancel the bulk ingest process. You must delete your collection to cancel an ongoing bulk ingest.
Updating Source Settings
Updating Source Settings is currently in Beta.
You can update settings for a source at any time using the Update Source API Endpoint.
This allows you to configure different configurations which can affect throughput, latency and cost associated with accessing the source. Each configuration has a default value and associated minimum and maximum values. We recommend referring to the corresponding source in Data Sources for relevant configurations and best practices to avoid unexpected cost or latency.
When you update the source setting, you may see a brief pause in ingestion for sources associated with the collection. It usually only takes a few seconds for the updated settings to take effect, but with some sources it can take up to a few minutes.
Here is an example on how to set the scan frequency for source Amazon S3 to 30 minutes:
$ curl --request PUT \
--url https://api.{region}.rockset.com/v1/orgs/self/ws/{workspace}/collections/{collection}/sources/{sourceId} \
-H 'Authorization: ApiKey {yourAPIKey}' \
-H 'Content-Type: application/json' \
-d '{
"s3": {
"settings": {
"s3_scan_frequency": "PT30M"
}
}
}'
You can find the Source ID using List Sources in Collection API Endpoint.
Updating Retention Period
The retention period of a collection can not be updated after the collection has been created. This is because when the collection is created, it is clustered by a time interval relative to the retention period. In order to change the retention period, the collection will need to be recreated to ensure that the data is clustered on the new time interval.
To quickly recreate a collection, go to the Collection Details page and select the 3 dots on the upper right corner. Click "Clone Collection Settings" to begin the recreation process. Be sure to change the retention period before clicking "Create".
Updated 5 months ago