Write API

This page covers how you can use a self-managed data source by adding documents to your Rockset Collections manually using the Write API.

What is the Write API?

The Write API refers to the subset of APIs in the Rockset API which are used to insert, update, upsert, or delete documents in a Rockset collection. You should use this option if either Rockset does not support managed integrations with your desired data source, or if you do not want Rockset to automatically sync your data and wish to manage syncing on your own.

🚧
If you choose not to use a managed integration, you will have to manage data syncing on your own.
This is in contrast to Rockset automatically syncing your data when using a managed integration, such as S3 or DynamoDB.

Create an Empty Collection

While you can directly add documents to any existing collection, you will need to first create an empty collection if you intend to use the Rockset API to add documents to a new collection.

You can create an empty collection by navigating to Collections > Create Collection > Write API in the Rockset Console.

The Rockset API also exposes a Create Collection endpoint enabling you to create an empty collection from your application code.

Document Manipulation

You can add, delete, and patch documents using the Write API. Please note that when performing these actions using the associated Rockset API endpoints, you must specify the collection directly--you cannot use a Collection Alias. This does not apply to performing document manipulation with INSERT INTO.

Add Documents

The Rockset API exposes an Add Documents endpoint so that you can insert data directly into your collections from your application code.

For your convenience, Rockset also maintains SDKs for Node.js, Python, Java, and Go. Each SDK has its own set of methods for using the REST API to add documents which you can find in its documentation.

🚧
Additions made via the Add Documents endpoint will always go through the ingest transformation.

Delete Documents

To delete existing documents from your collections, simply specify the _id fields of the documents you wish to remove and make a request to the Delete Documents endpoint.

Patch Documents

To update existing documents in a collection using the Rockset API, you can make requests to the Patch Documents endpoint. For each existing document you wish to update, you will need to specify the following two parameters:

_id holding the _id field (primary key) of the document which is being patched
patch holding a list of patch operations to be applied to that document, following the
JSON Patch standard.

Each patch operation is a dictionary with a key opstring indicating the patch operation, and additional keys pathstring, valueobject, and fromstring which are used as required arguments for this patch operation. The required arguments differ from one operation type to another. The JSON Patch standard defines several types of patch operations, their arguments, and their behavior. Refer to the JSON Patch documentation for more details.

If a patch operation’s argument is a field path, then it is specified using the JSON Pointer standard defined by the IETF. In essence, field paths are represented as a string of tokens separated by / characters. These tokens either specify keys in objects or indexes into arrays, and arrays are 0-based.

For example, in this document:

{
  "biscuits": [{ "name": "Digestive" }, { "name": "Choco Leibniz" }]
}

The path "/biscuits" would point to the biscuits array, while the path "/biscuits/1/name" would point to "Choco Leibniz".

There are six supported JSON patch operations:

add which adds a value (specified by the value parameter) to an object or inserts it into an
array (specified by the path parameter). In the case of an array, the value is inserted before the given index. The - character can be used instead of an index to insert at the end of an array. The parameters pathstring and valueobject are required for this operation.
remove which removes the first instance of an object or element of an array (specified by the
path parameter). The parameter pathstring is required for this operation.
replace which replaces the first instance of an object or element of an array (specified by the
path parameter) with a value (specified by the value parameter). This operation is equivalent to a remove operation immediately followed by an add operation. The parameters pathstring and valueobject are required for this operation.
copy which copies a value from one location (specified by the from parameter) to another
location (specified by the path parameter) within the JSON document. The parameters pathstring and fromstring are required for this operation.
move which moves a value from one location (specified by the from parameter) to another
location (specified by the path parameter) within the JSON document. The parameters pathstring and fromstring are required for this operation.
test which runs a test to check if a value (specified by the path parameter) is set in the
document. If the test fails, then the patch as a whole will not apply.

🚧
Patch Warning
Patches made via the Patch API endpoint will never go through the ingest transformation. Patches made using _op in an INSERT INTO query (refer to next section for more information) will always go through the ingest transformation.

`INSERT INTO` to Add, Delete, or Patch Documents

You can add, delete, or patch documents using a INSERT INTO statement, which allows you to add the results of a query into a collection. To patch documents, simply specify the _id of the field. If you SELECT the _id field of an existing document in that query, it will update the existing document rather than add a new document. To delete documents, specify the _id and specify _op as DELETE.

Below is an example of how to delete documents using _op and an INSERT INTO statement.

INSERT INTO workspace.collection
SELECT
    _id,
    'DELETE' as _op
FROM
    workspace.collection
WHERE
    field_name = 'any_value'

"num_docs_inserted": 101

Below is an example of how to patch documents using _op and an INSERT INTO statement.

INSERT INTO workspace.collection
SELECT
    _id,
    'new_value' as field_name,
    'UPDATE' as _op
FROM
    workspace.collection
WHERE
    _id = 'id_value'

"num_docs_inserted": 1

🚧
This method of using INSERT INTO statements to add, patch, or delete documents is not recommended and should only be used to perform one-off fixes.
This is because this method will inefficiently occupy query execution resources not optimized for data ingest. Instead, we generally recommend that you use the Rockset API to regularly update data in your collections.

💡
Understanding "num_docs_inserted" and "status": "ADDED"
After ingesting a document with an _op field specified, the query results include "num_docs_inserted" and the api response will include "status": "ADDED" to signify that the document was added to the processing queue. This does not imply that the document was added to the collection as the operation will occur after the document has left the queue.
For example, sending a query with a _op = DELETE will return "status": "ADDED" signifying that the document was added to the queue. When the document leaves the queue, the operation is triggered and the corresponding document is deleted (assuming the _id is valid).

Upload a File

To manually create a collection using a file as your data source, you can do so from the Rockset Console by navigating to Collections > Create Collection > File Upload. You can also upload files to any existing collections (or to this one after it has been created). The file formats currently supported include JSON, CSV, XML, Parquet, XLS and PDF.

Verify Collection is Updated

Before querying a collection, you can verify specific documents have been added, deleted, or patched when using the Write API along with the Get Collection Commit API. The Write API returns written offsets as last_offset, which follows the encoding format below:

 f<version>:<timestamp>:<min-offset>:<max-delta>:<deltas-base32-varints>
 
 * <version> is the version number of the offset.
 * <timestamp> is the unix timestamp.
 * <min-offset> is the smallest offset value across all partitions.
 * <max-delta> is the largest offset minus the smallest offset value.
 * <delta-base32-varints> is a sequence of 5-bit varints representing 
 * Crockford's base 32 (https://www.crockford.com/base32.html) encoding 
 * of all offsets.

You can verify the data in the returned offset can be queried by making requests to the Get Collection Commit API endpoint. Simply pass the last_offset in the name field and poll this endpoint until the passed field in the response returns true. This signifies the collection has been updated with the data from the associated write request. Thus, we can guarantee any subsequent queries will include the data associated with the request to the Write API.

Error Handling

Invalid Input (400) and Payload Too Large (413)

Write API and Kafka Connect requests are capped by size and number of documents per request as specified in Ingest Related Limits . If you see an error indicating "Payload size exceeds limit of X bytes" or "The number of documents specified in this request exceeds the maximum allowed limit of X documents", please try again with a smaller payload size, fewer documents per request, or use one of our managed sources.

Resource Exceeded (429)

To make sure your VI is sized appropriately to your ingest needs, monitor for the 429 Resource Exceeded status code. The client can receive the 429 error code in two cases:

The client is sending data faster than the Virtual Instance peak throughput limit or faster than the virtual instance can process
- The error message returned by the server is: Your account is configured with a maximum write rate limit and you have reached this limit. or Your Virtual Instance cannot keep up with the current write rate. Please slow down or increase the VI size to keep up.
- Use appropriate retry, backoff and jitter strategies if the client hits this error.
  📘
  Here is a good guide on how to implement this on the client side.
- If the application encounters 429 for a large retry count (10 or more), check the streaming ingest metrics. If the application requires high ingest throughput, then consider increasing your VI size to avoid throttling
The client is sending writes at an excessive rate
- The error message returned by the server is: You have issued too many requests over a short period of time.
- Use appropriate retry, backoff and jitter strategies if the client hits this error.
  📘
  Here is a good guide on how to implement this on the client side.
  If the application encounters 429 for a large retry count (10 or more), reach out to Rockset Customer Support.
- If the client requires sending more requests then consider buffering of records on the client and then sending a batch of records (>100KB in size) per Write API request.
- If the workload still requires a higher write rate, consider forwarding the documents to Amazon Kinesis or managed Kafka service like Confluent or Amazon MSK and then use that integration to sync data with Rockset. Since a managed integration, like Kinesis, is pull-based the limitations on how fast Rockset can pull data are based only on the source.

Write API

What is the Write API?

🚧
If you choose not to use a managed integration, you will have to manage data syncing on your own.

Create an Empty Collection

Document Manipulation

Add Documents

🚧
Additions made via the Add Documents endpoint will always go through the ingest transformation.

Delete Documents

Patch Documents

🚧
Patch Warning

`INSERT INTO` to Add, Delete, or Patch Documents

🚧
This method of using `INSERT INTO` statements to add, patch, or delete documents is not recommended and should only be used to perform one-off fixes.

💡
Understanding `"num_docs_inserted"` and `"status": "ADDED"`

Upload a File

Verify Collection is Updated

Error Handling

Invalid Input (400) and Payload Too Large (413)

Resource Exceeded (429)

📘
Here is a good guide on how to implement this on the client side.

📘
Here is a good guide on how to implement this on the client side.

What is the Write API?

🚧If you choose not to use a managed integration, you will have to manage data syncing on your own.

Create an Empty Collection

Document Manipulation

Add Documents

🚧Additions made via the Add Documents endpoint will always go through the ingest transformation.

Delete Documents

Patch Documents

🚧Patch Warning

INSERT INTO to Add, Delete, or Patch Documents

🚧This method of using INSERT INTO statements to add, patch, or delete documents is not recommended and should only be used to perform one-off fixes.

💡Understanding "num_docs_inserted" and "status": "ADDED"

Upload a File

Verify Collection is Updated

Error Handling

Invalid Input (400) and Payload Too Large (413)

Resource Exceeded (429)

📘Here is a good guide on how to implement this on the client side.

📘Here is a good guide on how to implement this on the client side.

🚧
If you choose not to use a managed integration, you will have to manage data syncing on your own.

🚧
Additions made via the Add Documents endpoint will always go through the ingest transformation.

🚧
Patch Warning

`INSERT INTO` to Add, Delete, or Patch Documents

🚧
This method of using `INSERT INTO` statements to add, patch, or delete documents is not recommended and should only be used to perform one-off fixes.

💡
Understanding `"num_docs_inserted"` and `"status": "ADDED"`

📘
Here is a good guide on how to implement this on the client side.

📘
Here is a good guide on how to implement this on the client side.