Collection Snapshot and Restore

🔐

Collection Snapshot and Restore is currently in Private Preview. Contact Rockset Support to enable this feature.

Snapshotting and restoring offers the opportunity to efficiently freeze a copy of a Collection that can be restored into a brand new collection later. This can be used for backing up data that might be modified by unexpected ingestion pipeline changes.

Creating Snapshots

Creating a snapshot of a collection makes a copy of the current collection (the contents of the snapshot may be up to 10 minutes behind the current contents). Each snapshot doesn’t make a full copy of the files but only tracks the changes from the previous snapshot. Therefore, you will only be billed for the incremental data between snapshots and not for the whole collection size every time a snapshot is created. Snapshots will expire at most 7 days after they are taken.

Example Create Request

# Example Request
curl --request POST \
    --url https://$ROCKSET_SERVER/v1/orgs/self/snapshots \
    -H 'Authorization: ApiKey $API_KEY' \
    -H 'Content-Type: application/json' \
    -d '{
    "collection_rrn": "rrn:col:usw2a1:a9999999-888c-77e6-55cb-f44444f3eb22",
    "description": "Collection backup snapshot",
    "expiration_time_millis": 1694472848129
  }'
# Example Response { "data": { "rrn": "rrn:snap:usw2a1:1111a111-bb32-44a4-2222-911223eddda4", "created_by": "[email protected]", "created_by_apikey_name": "test_api_key", "description": "Collection backup snapshot", "created_at": "2023-09-13T00:36:21Z", "collection_rrn": "rrn:col:dev-usw2a1:a9970573-368c-48e6-99cb-f65399f4eb56", "status": "REQUESTED", "expiration_time_millis": 1694566333033, "size_bytes": 999876543 } }

Creating a snapshot will put the snapshot into a REQUESTED state. You must wait until snapshot state is changed to CREATED before it is eligible to be restored from. You can list all snapshots and their statuses by querying the snapshot endpoint.

Example List Request

# Example Request
curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/snapshots \
    -H 'Authorization: ApiKey $API_KEY'
# Example Response { "data": [ { "rrn": "rrn:snap:usw2a1:1111a111-bb32-44a4-2222-911223eddda4", "created_by": "[email protected]", "created_by_apikey_name": "test_api_key", "description": "Collection backup snapshot", "created_at": "2023-09-13T00:36:21Z", "collection_rrn": "rrn:col:dev-usw2a1:a9970573-368c-48e6-99cb-f65399f4eb56", "status": "REQUESTED", "expiration_time_millis": 1694566333033, "size_bytes": 999876543 }, { "rrn": "rrn:snap:usw2a1:2222a222-bb33-55a6-7777-888888eddda9", "created_by": "[email protected]", "created_by_apikey_name": "test_api_key", "description": "Collection backup snapshot", "created_at": "2023-09-13T02:37:21Z", "collection_rrn": "rrn:col:usw2a1:a8888888-228c-55e6-99cb-f77788f5eb44", "status": "CREATED", "expiration_time_millis": 1694466193032, "size_bytes": 9999999 } ] }

Restoring Collections

Restoring a snapshot materializes all the data referenced by a snapshot into a brand new collection that has it’s own copy of all the data. Changes made to the to source collection will not effect the restored collection and vice versa. The newly created collection will also not be connected to any streaming ingest sources.

It is possible to connect a streaming source to the restored collection. But do note that when connecting to a streaming source all the documents from the source will be re-ingested in a streaming fashion. For instance, connecting to an S3 source will result in all the S3 objects in the path being re-ingested.

For collections with retention set, the retention settings cannot be modified when restoring. This is because when the collection is created, it is clustered by a time interval relative to the retention period. In order to change the retention period, the collection will need to be recreated and reindexed to ensure that the data is clustered on the new time interval. Restoring does not reindex. When a collection is restored, all the documents who’s _event_time is older than the retention period will be dropped.

As restoring a collection requires making a full copy of the data, this is not instantaneous but it is significantly faster than re-ingesting a collection you need to recover. Furthermore, to restore a collection larger than 10TB please contact Rockset Support first. When fully restored, this new collection will be queryable and modifiable like any other Rockset collection.

# Example Request
curl --request POST \
    --url https://$ROCKSET_SERVER/v1/orgs/self/ws/$WORKSPACE/collections \
    -H 'Authorization: ApiKey ' \
    -H 'Content-Type: application/json' \
    -d '{
    "name": "restored_collection",
    "sources": [
      {
        "snapshot": {
          "source_snapshot_rrn": "rrn:snap:usw2a1:22f3333a-f4ac-5555-6c77-8bbac99d0ad1"
        }
      }
    ]
  }'
# Example Response { "data": { "created_at": "2023-09-13T00:59:26Z", "created_by": "[email protected]", "rrn": "rrn:col:usw2a1:c60f05fb-b08e-4715-b369-70d03b600e67", "name": "restored_collection", "workspace": "commons", "status": "CREATED", "sources": [ { "s3": null, ... "snapshot": { "source_snapshot_rrn": "rrn:snap:usw2a1:22f3333a-f4ac-5555-6c77-8bbac99d0ad1" }, "system": null, "status": null, "format": null, "format_params_csv": null, "format_params": null, "preview_source_id": null, "ingest_transformation": null, "integration_id": null, "initial_dump_done": null, "parent_id": null, "source_object": null, "ingester_version": null, "suspended_at": null, "resume_at": null, "seqno": null } ], "field_mappings": [], "field_mapping_query": { "sql": "" }, "field_partitions": [], "clustering_key": [], "aliases": [], "field_schemas": [], "ingest_virtual_instance_id": "2bb1cccc-68dd-4e9a-83gg-h7i8j6jlmnop", "read_only": false, "storage_type": "HOT", "storage_compression_type": "LZ4", "created_by_apikey_name": "test-key", "insert_only": false } }