INSERT INTO s3

🛠️

INSERT INTO s3 is currently in Beta.

Query Structure

INSERT INTO s3_uri
  credentials
  select_query

INSERT INTO s3_uri exports query results from Rockset and writes them directly to Amazon S3.

  • s3_uri specifies which bucket the data is uploaded to (e.g. 's3://bucket/prefix').
  • credentials define the credentials Rockset uses to write to the S3 bucket. The credentials must include one of the following references:
    • INTEGRATION = 'integration_name' to use the credentials of an existing S3 integration.
    • CREDENTIALS = (AWS_ROLE = 'aws role arn', AWS_EXTERNAL_ID = 'externalid')
  • select_query must represent a valid SELECT command.

Query Results

Rockset exports the query results to the S3 destination path and encodes the data in JSON format as specified in Data Types. Date, time, and geography data types are encoded in a special format with the __rockset_type wrapping.

Rockset chunks the results and generates multiple files to optimize parallelized consumption downstream. Rockset targets for each file to contain approximately 1,000 documents by default. This is configurable using s3_sync_op_output_chunk_size hint. See example below for details.

You can monitor the query status to track the status of the export. Once the query executes successfully, the query results will contain:

  • num_docs_inserted: the result count of the SELECT command (equivalent toresult_set_document_count in the Rockset API).
  • num_files_written: the number of files written to S3.

Associated Permissions and Credentials

Users must have the EXPORT_DATA_GLOBAL and LIST_INTEGRATIONS_GLOBAL privileges to run the command. The built-in admin and member roles have these privileges by default. The read-only role does not have the EXPORT_DATA_GLOBAL privilege. If you are creating a custom role, you must add the EXPORT_DATA_GLOBAL and LIST_INTEGRATIONS_GLOBAL privileges through the Create a Role or Update a Role endpoints or through the Console (note that Console support for adding the EXPORT_DATA_GLOBAL privilege will be available after the Public Preview release).

The associated S3 integration and credentials must have the s3:PutObject permission, and you must configure the role to have your Rockset account as a trusted entity. Follow the S3 integration step-by-step guide for creating the role and add the s3:PutObject privilege during the process:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:List*", "s3:GetObject", "s3:PutObject"],
      "Resource": [
        "arn:aws:s3:::<your-bucket>",
        "arn:aws:s3:::<your-bucket>/*"]
    }
  ]
}

Other Considerations

  • TIMESTAMP values are encoded as a special type to match behavior with other date-time types. A timestamp value will be encoded like {"__rockset_type": "timestamp", value":"1513108026000000"} where value is microseconds since unix epoch.
  • The query must execute within 30 minutes, which represents the maximum query execution time limit within Rockset. We recommend using async mode to export large amounts of data.
  • Rockset only supports exporting data to S3 buckets residing in the same region as the Rockset collection.

INSERT INTO s3 Examples

Export an entire collection using the credentials of an S3 Integration with s3:PutObject permissions:

INSERT INTO 's3://analyticsdata/query1'
  INTEGRATION = 's3export'
SELECT * FROM commons.analytics

To export the data, but target 2,000 documents per file:

INSERT INTO 's3://analyticsdata/query1'
  INTEGRATION = 's3export'
SELECT * FROM commons.analytics
HINT(s3_sync_op_output_chunk_size=2000)

Export the results of an aggregation query to S3 using a hardcoded AWS Role:

INSERT INTO 's3://companyanalysis/cities'
  CREDENTIALS = (
    AWS_ROLE = 'arn:aws:iam::012345678901:role/rocksetexport',
    AWS_EXTERNAL_ID = '79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be'
  )
SELECT
    offices.value.state,
    ARRAY_AGG(offices.value.city) AS cities
FROM
    companies,
    UNNEST(companies.offices AS value) AS offices
GROUP BY
    offices.value.state