Amazon Kinesis
This page covers how to use an Amazon Kinesis stream as a data source in Rockset. This includes:
- Creating an Amazon Kinesis Integration to securely connect Kinesis streams in your AWS account with Rockset.
- Creating a Collection which syncs your data from a Amazon Kinesis stream into Rockset.
For the following steps, you must have access to an AWS account and be able to manage AWS IAM policies and IAM users within it.
If you do not have access, please invite your AWS administrator to Rockset.
Create a Kinesis Integration
The steps below show how to set up an Amazon Kinesis integration using AWS Cross-Account IAM Roles and AWS Access Keys (deprecated). An integration can provide access to one or more Kinesis streams within your AWS account. You can use an integration to create collections that sync data from your Kinesis streams.
Step 1: Configure AWS IAM Policy
- Navigate to the IAM Service in the AWS Management Console.
- Set up a new policy by navigating to Policies and clicking "Create policy".
If you already have a policy set up for Rockset, you may update that existing policy.
For more details, refer to AWS Documentation on IAM Policies.
- Set up read-only access to your Kinesis stream. You can switch to the
JSON
tab and paste the policy shown below. You must replace<your-stream>
with the name of your Kinesis stream. If you already have a Rockset policy set up, you can add the body of theStatement
attribute to it.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:ListShards",
"kinesis:DescribeStream",
"kinesis:GetRecords",
"kinesis:GetShardIterator"
],
"Resource": ["arn:aws:kinesis:*:*:stream/<your-stream>"]
}
]
}
- Save the newly created or updated policy and give it a descriptive name. You will attach this policy to a user or role in the next step.
Why these permissions?
kinesis:ListShards
— Required for listing and getting metadata on each shard.kinesis:DescribeStream
— Required for fetching metadata about the Kinesis stream.kinesis:GetRecords
— Required for reading records from a shard.kinesis:GetShardIterator
— Required for iterating over records in a shard.
Advanced Permissions
You can set up permissions for multiple streams by modifying the Resource
ARNs. The format of the ARN for Kinesis is as follows: arn:aws:kinesis:region:account-id:stream/stream-name
.
You can substitute the following resources in the policy above to grant access to multiple streams as shown below:
- All streams in us-west-2
arn:aws:kinesis:us-west-2:*:stream/*
- All streams starting with "sales"
arn:aws:kinesis:*:*:stream/sales*
- All streams in your account
arn:aws:kinesis:*:*:stream/*
For more details on how to specify a resource path, refer to AWS documentation on Kinesis ARNs.
Step 2: Configure Role / Access Key
There are two mechanisms by which you can grant Rockset permissions to access your AWS resource. Although Access Keys are supported, Cross-Account roles are strongly recommended as they are more secure and easier to manage.
AWS Cross-Account IAM Role
The most secure way to grant Rockset access to your AWS account involves giving Rockset's account cross-account access to your AWS account. To do so, you'll need to create an IAM Role that assumes your newly created policy on Rockset's behalf.
You'll need information from the Rockset Console to create and save this integration.
-
Navigate to the IAM service in the AWS Management Console.
-
Setup a new role by navigating to Roles and clicking "Create role".
If you already have a role for Rockset set up, you may re-use it and either add or update the above policy directly.
- Select "Another AWS account" as type of trusted entity, and tick the box for "Require External ID". Fill in the Account ID and External ID fields with the values (Rockset Account ID and External ID respectively) found on the Integration page of the Rockset Console (under the Cross-Account Role Option). Click to continue.
- Choose the policy created for this role in Step 1 (or follow Step 1 now to create the policy if needed). Then, click to continue.
- Optionally, add any tags and click "Next". Name the role descriptively, e.g. 'rockset-role', and once finished record the Role ARN for the Rockset integration in the Rockset Console.
AWS Access Key (deprecated)
Navigate to the IAM service in the AWS Management Console.
-
Create a new user by navigating to Users and clicking "Add User".
-
Enter a name for the user and check the "Programmatic access" option. Click to continue.
-
Choose "Attach existing policies directly" then select the policy you created in Step 1. Click through the remaining steps to finish creating the user.
-
When the new user is successfully created you should see the Access key ID and Secret access key displayed on the screen.
- Record both these values in the Rockset Console.
Create a Collection
Once you create a Collection backed by an Amazon Kinesis stream, Rockset continuously ingests objects from your stream and updates your collection automatically as new objects are added.
You can create a collection from a Kinesis source in the Collections tab of the Rockset Console.
These operations can also be performed using any of the Rockset client libraries, the Rockset API, or the Rockset CLI.
Best Practices
Amazon Kinesis allows up to 5 requests per second per shard and a read throughput of 2 MB/sec per shard. We recommend no more than one collection reads from a Kinesis stream. If multiple collections read from the same Kinesis stream, they will encounter rate limits imposed by Kinesis, which results in throttling. When encountering throttling, Rockset retries with backoff, which can increase data ingestion latencies. Read more about the limits here.
Updated 8 months ago