Async Queries
Queries on Rockset typically must complete within 2 minutes. Use async
mode to run longer queries of up to 30 minutes.
The workflow for Async Queries:
- Send a query request with async enabled.
- Receive back a query ID.
- Poll to check the status of the query, which can have the values of
QUEUED
,RUNNING
,COMPLETED
,ERROR
, orCANCELLED
. - Fetch results when the query completes.
Rockset does not recommend using async queries for latency sensitive workloads due to the overhead of additional network requests.
Sending an Async Query Request
To send the request asynchronously, set the async
parameter to true in either a Query Lambda or query request.
Example request:
curl --request POST \
--url https://$ROCKSET_SERVER/v1/orgs/self/queries \
-H 'Authorization: ApiKey $API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"sql": {
"query": "SELECT * FROM foo;",
},
"async": true
}'
This query request will immediately return with a query id that can be used to poll and retrieve results.
Example response:
{
"query_id": "db3044b9-ea4e-43f2-8cd5-138092ab9b96:lii9i5B:0",
"status": "QUEUED",
...
}
Polling for Query Status
After submitting an async query request, periodically retrieve the query status to find out if the query has completed.
Example request:
curl --request GET \
--url https://$ROCKSET_SERVER/v1/orgs/self/queries/{query_id}
-H 'Authorization: ApiKey $API_KEY' \
The response contains a status field, which has the possible values of QUEUED
, RUNNING
, COMPLETED
, ERROR
, and CANCELLED
.
If the status of the query is ERROR
, the error will be available in the query_errors
field.
Example response:
{
"data": {
"query_id": "5139dcbc-5abc-4c8c-ad69-6feff96f17ef:BBQg6lF:0",
"status": "ERROR",
...
"query_errors": [
{
"type": "QUERY_TIMEOUT",
"message": "Query timeout reached. The resources allocated for your Virtual Instance are not sufficient
to run this query. Please upgrade to a larger Virtual Instance or contact Rockset customer support
for assistance constructing a more efficient query.",
"status_code": 408
}
]
}
}
When the status of the query is COMPLETED
, you may retrieve results.
Retrieving Query Results
Example request for 1000 documents:
curl --request GET \
--url https://$ROCKSET_SERVER/v1/orgs/self/queries/$QUERY_ID/pages?offset=290000&docs=10000
-H 'Authorization: ApiKey $API_KEY' \
The docs
parameter is optional. If you choose not to add a docs
parameter, the default will be 10,000 documents. The maximum value for docs
is 100,000.
There is also an offset
query parameter, which specifies the offset from the cursor of the first document to be returned. The maximum value for offset
is 1,000,000,000. offset
will default to 0 if not specified.
Example response:
{
"results": [
{
“Field1”: ”value1”
},
...
],
"results_total_doc_count": 10000000
"pagination": {
"current_page_doc_count": 500,
"next_cursor_offset": 1500, // This number is the number of documents before the current page.
"next_cursor": fds23jurzjsa31 // This value will be null if there are no more results.
}
}
If there is more than one page of results, use the next_cursor
field returned in the response to request the next page of results. Alternatively, you can use the offset
parameter to go back and forth between results pages.
Example request:
curl --request GET \
--url https://$ROCKSET_SERVER/v1/orgs/self/queries/$QUERY_ID/pages?cursor=fds23jurzjsa31&docs=10000
-H 'Authorization: ApiKey $API_KEY' \
Advanced Usage
Setting a Client Timeout
To avoid the additional network requests for short queries, you can optionally set async_options.client_timeout_ms
. If the query completes before the client timeout, the results will be returned in-band without needing to poll and retrieve results later. For example, if you want queries under 60 seconds to return results in-band, you can make this request:
curl --request POST \
--url https://$ROCKSET_SERVER/v1/orgs/self/queries \
-H 'Authorization: ApiKey $API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"sql": {
"query": "SELECT * FROM foo;",
},
"async": true,
"async_options": {
"client_timeout_ms": 60000 // Queries under 60 seconds will return results in-band
}
}'
Response for a query that completes before the client timeout (same as a non-async query):
{
"stats": {
"elapsed_time_s": 1218,
"throttled_time_micros": 0
},
"results": [
{
“Field1”: ”value1”
},
...
],
"status": "COMPLETED"
}
Since the additional overhead to store query results is avoided when a query completes before the client timeout, the query status and query results will not be available in the GET .../queries/{queryId}
and GET .../queries/{queryId}/pages
APIs described below. Instead, the status and results will be returned in-band, as it is for non-async queries.
Make sure to handle both the short and long-running query cases when setting the client timeout since you will not know when the query will complete. A query that completes before the client timeout will have a status of COMPLETED
in the initial response, while a query that continues to run after the client timeout will have a status of QUEUED
or RUNNING
.
Client Timeout Note
Queries with a client timeout set will return a default of 10,000 results in the initial response, with the remaining results to be retrieved through the pagination API.
Updated about 1 year ago