The [Rockset Terraform Provider](🔗) gives you the ability to use and deploy Rockset resources programmatically using the [Terraform](🔗) framework. This will allow you to idempotently deploy Rockset [<<glossary:Integrations>>,](🔗) [<<glossary:Collections>>](🔗) and other resources along with any other third party dependencies simplifying and automating the deployment process. Below are the steps for how to get Rockset integrated into your existing CI/CD pipelines with Terraform.

## Installation

First, we need to make sure Terraform is installed. We can follow the official installation steps [here](🔗). After that we can verify that terraform is ready to use by checking the help command.

## Rockset Provider

A Terraform provider adds a set of [resources](🔗) and/or [data sources](🔗) which Terraform will then be able to deploy. You can read more about how providers work [here](🔗). To import Rockset's provider we first set it as a required dependency we can then declare the provider.

In a file called **** we add:

Now from the same directory we can call `terraform init` which will initialize our Terraform state and pull any needed dependencies.

We will also need to set the ENV variables `ROCKSET_APIKEY` and `ROCKSET_APISERVER` based on your account. You can view your region endpoints and apikeys from the [API Keys tab of the Rockset console](🔗). For unix based systems:

## Example: Amazon RDS (Hosted PostgreSQL)

To connect to Amazon RDS a few configurations and resources will have to be set up. Doing all of this manually can be tedious, prone to mistakes and is not easily repeatable which makes it a perfect use case for Terraform. We will need to setup our Terraform configuration so that the deploy will:

  • [Configure PostgreSQL Server](🔗)

  • [Create an AWS Kinesis Stream](🔗)

  • [Set up the AWS Data Migration Service (DMS)](🔗)

  • Finally create the Rockset integration and collection

### Download AWS Provider

First, lets update our **** to also import the AWS Terraform provider. You can also set any AWS Terraform configurations needed here. After updating this you will need to run `terraform init` again.

Variables of the form var.\* are [Terraform input variables](🔗) which allow us to create generic Terraform templates. You can define these variables as Terraform commandline arguments or in a separate file like **terraform.tfvars**.

### Configure RDS

Configuration Warning

In this step we bring the existing RDS instance into our Terraform environment in order to enable backups and replication. Terraform will blindly apply any changes from the configuration file to the existing database. You can alternatively update this [manually](🔗) in the console and skip this step in the Terraform workflow.</sub>

Before updating your RDS service using Terraform you will need to make sure that you are properly authenticated with AWS. You can check [here](🔗) to learn more about AWS authentication and [here](🔗) to learn how AWS authentication interacts with Terraform.

Once we have our authentication set up we can work on importing our RDS state and apply an update to allow CDC streaming. We need to add some terraform for our AWS RDS instance. We will create a new file ****:

The most important thing above is that we have a parameter group with `rds.logical_replication ` set to 1 and that we have set up backups on our RDS instance. Everything else you will want to inherit from your existing instance. To pull the state for your existing instance down into your local Terraform state you will have to perform a Terraform import using your [RDS indentifier](🔗), which you can find in the AWS console. You can learn more about Terraform imports [here](🔗)

Alt Text

Now that we have the Terraform state for our RDS instance we can update our Terraform file to better represent the RDS instance we already have. We can see a diff of these configurations by running:

This should give us an idea of what is different between our current Terraform file and what is already deployed (in our local state). Make any necessary changes to the Terraform file so that existing configuration is not lost. Note that the plan will mention the RDS instance will have to be restarted. This is because a parameter group is a static configuration which requires a restart in order to change.

At this point if you're feeling comfortable with your changes run:

This will carry out the actual changes and complain if anything breaks during the deploy.

### Create Kinesis Streams

Now that our RDS instance is configured correctly we need to create a Kinesis stream which will serve as the glue between our RDS instance and Rockset. In a separate file called **** we can add the following:

There's a lot going on above and it's worth briefly talk about what this does. First, we create the Kinesis resource itself. You can read more about what Kinesis is [here](🔗) but just think of it as a streaming database that will keep RDS in sync with Rockset.

Next, we have two new roles with different policies created. **kinesis_dms** is the role that will be used by DMS in the next step to provide the stream with updates. You can see from it's associated policy at the top that it needs permission to put new records into Kinesis. **kinesis_rockset** is the role that will be used by the Rockset integration to read records from Kinesis. You can see that Rockset only needs to get records and does not have "PutRecords" permissions.

### Set up DMS

Now that we have a Kinesis resource set up let's stitch it together with our RDS instance. Create a new file called ****:

Now that's a lot of Terraform. Let's see why we need all of this. In order to set up our DMS replication instance the following roles are needed:

  • dms-vpc-role

  • dms-cloudwatch-logs-role

  • dms-access-for-endpoint

To read more about these roles and why they are needed for DMS you can view the [AWS Data Migration Service Guide](🔗).

Next, we define the DMS subnet group which is simply a collection of subnets that will be used by the DMS Replication Instance. Along with the subnet group we also define the DMS replication instance itself. The replication instance is an EC2 instance that performs the actual data migration. It serves as a buffer between the DMS source and target database and performs reads on the source database and then applies any desired transformations for the target database. The replication instance must be attached to two replication endpoints which will define the source database and the target database. Finally, the `aws_dms_replication_task` defines the actual task that is to be perfomred including what data should be read and what mappings should take place.

We now have the major parts of this integration set up. Let's run a terraform apply to make sure there are no hiccups in setting up DMS between our RDS instance and Kinesis.

### Create Rockset Integration and Collections

It's now time to create the Rockset integration and a collection. Make sure that your apiserver end point and apikey for Rockset is properly set in the Provider or in your local ENV. Let's define one last Terraform file ****:

One more `terraform apply` and all done! The above gives Rockset the permission to start tailing from the Kinesis streams we just created. If you navigate to the Rockset console you should now see a new Amazon Kinesis integration and corresponding collections for each table in our RDS instance.

Check out our [blog](🔗) on "How to Use Terraform with Rockset" for more info!