Query Results Pagination

Pagination refers to the practice of dividing a large dataset into smaller, more manageable subsets called pages and retrieving them one at a time. It is commonly used to improve the performance and user experience when dealing with large result sets in queries.

๐Ÿ’ก

Pagination Tip

We recommend using pagination when the query result contains more than 100,000 rows or 100 MB of data.

When a query returns a large number of records, fetching all of them at once can be inefficient and resource-intensive. Pagination allows you to retrieve and display a specific subset of the results, typically based on a defined page size (e.g. 100 records per page).

Rockset implements a cursor-based approach to pagination where, instead of relying on offsets, a cursor is used as a reference point to identify the position from which to retrieve the next set of results.

With this approach, an initial query is executed to fetch all of the data at once and store it temporarily in object storage so that the result set can be revisited over time. As the initial query often scans through large datasets, we recommend using async queries with pagination to exceed the 2 minute default timeout on queries. With async queries, query SLA times default to 30 minutes and that time can be customized by the user.

There are several benefits to this pagination approach:

  • Reduced processing needs as you only query once
  • Improved latency for large result sets
  • Consistent analytics

๐Ÿ“˜

Check out this blog to learn more about our pagination method.

๐Ÿ’ก

Pagination Tip

If you are concerned about a pagination query impacting the performance of ingestion or other queries running on the same virtual instance, you should consider running pagination queries on a separate Virtual Instance.

Sending a Query Request

To use pagination, set a limit on the number of initial results using the query request parameter max_initial_results. For example, if your application displays results to users in groups of 1000, set max_initial_results to 1000.

๐Ÿ’ก

Pagination Tip

If you expect your query to exceed the 2 minute query timeout, use an async query to run queries up to 30 minutes.

Example request:

curl --request POST \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries \
    -H 'Authorization: ApiKey $API_KEY' \
  -H 'Content-Type: application/json' \
    -d '{
    "sql": {
      "query": "SELECT * FROM foo;"
    },
    "max_initial_results": 1000
  }'

max_initial_results must be a value between 0 and 100,000.

Example response:

{
  "query_id": "5b596206-c632-4a08-8343-0c560f7ef7f1",
  "results": [
    {
      ...
    }
  ],
  ...
  "results_total_doc_count": 10000000, // This is the total number of documents returned by the query
  "pagination": {
    "current_page_doc_count": 1000,
    "start_cursor": null
    "next_cursor": "aabawe153wtea352"  // This will be null if there are no more results than those returned in this response.
  }
}

The results collection contains the actual result set, while the query_id and pagination fields contain all the information necessary to fetch the next results page. In order to fetch the first set of results from the start, you can omit the cursor parameter and call the /orgs/self/queries/{query_id}/pages?docs=[number of docs] endpoint with the optional docs parameter.

Fetching the Next Page of Results

Invoke the /orgs/self/queries/{query_id}/pages?cursor={next_cursor}&docs=[number of docs] endpoint to fetch the next page of results. Retrieve the values from the query_id and next_cursor fields from the response above. Then, pass them into the respective query parameters in the endpoint.

The docs parameter is optional. If you choose not to add a docs parameter, the default will be 10,000 documents. The maximum value for docs is 100,000.

Alternatively, you can use the offset query parameter, which specifies the offset from the cursor of the first document to be returned. The maximum value for offset is 1,000,000,000. offset will default to 0 if not specified.

Example request:

curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries/5b596206-c632-4a08-8343-0c560f7ef7f1/pages?cursor=aabawe153wtea352&docs=500 \
    -H 'Authorization: ApiKey $API_KEY' \
  -H 'Content-Type: application/json'

Or using the offset parameter:

curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries/5b596206-c632-4a08-8343-0c560f7ef7f1/pages?offset=290000&docs=500 \
    -H 'Authorization: ApiKey $API_KEY' \
  -H 'Content-Type: application/json'

Example response:

{
  "results": [
    {
      โ€œField1โ€: โ€value1โ€
    },
    ...
  ],
  "results_total_doc_count": 10000000
  "pagination": {
    "current_page_doc_count": 500,
    "next_cursor_offset": 1500,    // This number is the number of documents before the current page.
    "next_cursor": fds23jurzjsa31  // This value will be null if there are no more results.
  }
}

๐Ÿšง

If there are no more documents after the current page, the value for next_cursor will be null.

Obtaining the Status of Paginated Queries

Results of paginated queries will expire in 24 hours. To get the expiration time of a query, use the GET /orgs/self/queries/{queryId} endpoint.

Example request:

curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries \
    -H 'Authorization: ApiKey $API_KEY' \
  -H 'Content-Type: application/json'

Example response:

{
  "data": {
    "query_id": "5b596206-c632-4a08-8343-0c560f7ef7f1",
    "state": "COMPLETED"
    "run_by": "[email protected]",
    "run_at": "2001-08-28T00:23:41Z",
    "expires_at": "2001-08-28T00:33:41Z",
    "stats" : {
      "elapsed_time_ms": 12000,
      "throttled_time_ms": 126,
      "doc_count": 27842,
      "size_bytes": 455223422
    },
    "results_total_doc_count": 10000,
	  "pagination": {
	    "current_page_doc_count": 1000,
	    "start_cursor": null
	  }
  }
}

๐Ÿ“˜

Check out this Office Hours video from one of our solutions engineers to learn more about using Pagination on Rockset.