Data Sources > Amazon Redshift

Amazon Redshift

This page covers how to use an Amazon Redshift cluster as a data source in Rockset. This includes:

  • Creating an Amazon Redshift integration to securely connect Redshift clusters in your AWS account with Rockset.
  • Creating a collection which syncs your data from a table in Amazon Redshift into Rockset in real-time.

For the following steps, you must have access to an AWS account and be able to manage AWS IAM policies and IAM users within it. If you do not have access, please invite your AWS administrator to Rockset.

Create a Redshift Integration

The steps below show how to set up an Amazon Redshift integration using Redshift Access Privilieges and AWS Access Keys. An integration can provide access to Redshift cluster within your AWS account. You can use an integration to create collections that sync data from your tables in the cluster.

Step 1: Create Amazon S3 Bucket for Unload

Rockset unloads Amazon Redshift data to Amazon S3 bucket which exists in user’s AWS account. This AWS S3 bucket should exist in the same region as the Redshift cluster. For more details, refer to AWS Documentation on Unloading Data to Amazon S3

  1. Navigate to the S3 Service in the AWS Management Console.
  2. Set up a new S3 bucket by clicking Create bucket. If you already have an existing S3 bucket you want to use, you can provide the path to that S3 bucket. For more details, refere to AWS Documentation on Create a S3 Bucket.
  3. Record the S3 bucket path in the Rockset Console within a new Redshift integration AWS S3 Create Bucket

Step 2: Configure AWS IAM Policy

  1. Navigate to the IAM Service in the AWS Management Console.
  2. Set up a new policy by navigating to Policies and clicking Create policy. If you already have a policy set up for Rockset, you may update that existing policy. For more details, refer to AWS Documentation on IAM Policies. AWS IAM Policies
  3. Set up access to your S3 bucket. You can switch to the “JSON” tab and paste the policy shown below. You must replace <your-bucket> with the name of your S3 bucket. If you already have a Rockset policy set up, you can add the body of the Statement attribute to it.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
       "Action": [
         "s3:PutObject",
         "s3:GetObject",
         "s3:DeleteObject"
        ],
       "Resource": "arn:aws:s3:::<your-bucket>/*"
      },
    {
      "Effect": "Allow",
      "Action": "s3:List*",
      "Resource": [
        "arn:aws:s3:::<your-bucket>/*",
        "arn:aws:s3:::<your-bucket>"
      ]
    }
  ]
}
  1. Save the newly created or updated policy and give it a descriptive name. You will attach this policy to a user or role in the next step.

Why these permissions?

  • s3:PutObject - Required. Required to unload Redshift data to S3.
  • s3:GetObject — Required by Rockset to retrieve objects from your Amazon S3 bucket.
  • s3:List* — Required. Redshift unload command uses the s3:ListBucket and s3:ListAllMyBuckets permissions to read bucket and object metadata.
  • s3:DeleteObject - Optional. Used by Rockset to cleanup Redshift unload artifacts from Amazon S3 bucket.

Step 3: Configure AWS Access Key

In this step, you will grant Rockset permissions to access your AWS resource using AWS Access Keys.

  1. Navigate to the IAM service in the AWS Management Console.

  2. Create a new user by navigating to Users and clicking Add User. If you have already created a user for Rockset in the past, you can attach the policy created in the previous section to that user. AWS IAM Users

  3. Enter a name for the user and check the Programmatic access option. Click to continue. AWS IAM Create User

  4. Choose Attach existing policies directly then select the policy you created in Step 1. Click through the remaining steps to finish creating the user. AWS IAM Attach Policy

  5. When the new user is successfully created you should see the Access key ID and Secret access key displayed on the screen.

AWS IAM Access Key If you are attaching the policy to an existing IAM user, you can navigate to “Security Credentials” under the IAM user and generate a new access key.

  1. Record both these values in the Rockset Console within a new Redshift integration.

Step 4: Configure Redshift Cluster Access

Rockset requires the Redshift cluster endpoint and port to access it. Also, if your Redshift cluster is inside a VPC, you must add Rockset IPs to your VPC security group inbound rules to allow Rockset access to the cluster.

  1. Navigate to the Redshift service in AWS Management console and select the cluster you want Rockset to connect to.
  2. Record the Redshift host and port values in the Rockset Console within a new Redshift integration.

AWS Redshift Host and Port

  1. Click the VPC Security group. This will navigate you to AWS VPC security group page. AWS Redshift Cluster Security Group

  2. If the security group selected in the previous step is not auto selected, select it, click Inbound followed by Edit. AWS Security Group Edit Rules

  3. Create three Custom TCP rules as below: AWS Security Group Custom TCP Rule

Step 5: Grant Redshift database user permissions

  1. Navigate to the Redshift service in AWS Management console, select the cluster and go to Query Editor. AWS Redshift Query Editor
  2. Run the following CREATE user GRANT SQL commands to provide read-only access to Rockset.
Create user "rockset" or choose any name you like:
  CREATE USER rockset PASSWORD '<password>'

Grant access to schema:
  GRANT USAGE ON SCHEMA '<schema>' TO rockset;

Grant access to specific table:
  GRANT SELECT ON TABLE '<table>' TO rockset;

Grant access to all tables in a specific schema:
  GRANT SELECT ON ALL TABLES IN SCHEMA '<schema>' to rockset;

Create a Collection

Once you create a collection backed by Amazon Redshift, Rockset unloads Redshift table to S3 and ingests it. Optionally, you can also provide a timestamp field in your Redshift table like created_at to monitor for new rows. The sync latency is no more than a few seconds when the source is getting updated continuously and no more than 5 minutes when the source gets updated infrequently. Note that we currently do not support updates or deletes, only appends.

In the Rockset Console, you can create a collection from Workspace > Collections > Create Collection. Amazon Redshift Create Rockset Collection