Data Sources > Google Cloud Storage

Google Cloud Storage

This page covers how to use a Google Cloud Storage bucket as a data source in Rockset. This includes:

  • Creating a Google Cloud Storage integration to securely connect buckets in your GCP account with Rockset.
  • Creating a collection which syncs your data from a Google Cloud Storage bucket into Rockset.

For the following steps, you must have access to a Google Cloud account and be able to manage Google Cloud Service Accounts and Roles. If you do not have access, please invite your GCP account administrator to Rockset.

Creating a Google Cloud Storage Integration

These instructions explain how to set up a Google Cloud Storage integration using a GCP Service Account. An integration can provide access to one or more GCS buckets within your GCP account. You can use an integration to create collections that sync data from your GCS buckets.

Step 1: Set Up Your GCP Service Account

To access your GCP resources, Rockset uses a GCP Service Account with permissioned access to your desired GCS buckets. You can either use an existing service account or create a new one for Rockset to use. Once you complete these steps, you can use the JSON key associated with the service account to create the Rockset integration in the Rockset console.

Creating a New Service Account

If you don’t have an existing service account or want to use a new service account, you will need to navigate to the “IAM & Admin” section in the Google Cloud Console sidebar, and then select the “Service Accounts” tab within that section.

From there, you can create a new service account by selecting the “Create Service Account” button at the top and then follow the instructions on the page for completing its creation.

Create GCP Service Account

For more details, you can read about how to manage and create service accounts in the GCP documentation found here.

Creating a New Key Pair

On the service accounts home page in the Google Cloud Console, select your desired service account (if you just created a new service account for Rockset above, select your newly created account) to view its details. Under the “Keys” section, select "Add Key", and then "Create New Key". Select “JSON” for the key type and then click "Create".

Create GCP Service Account

Once the key is created successfully, it should trigger an automatic download with your key’s associated JSON. This JSON will be required to create the GCP integration within Rockset Console.

Step 2: Configure Your GCS Bucket Permissions

In order to access Google Cloud Storage buckets, you must provide roles to the service account that allow access to specific buckets. To do so, you will need to navigate to the “Storage” section in the Google Cloud Console sidebar, and then select the “Browser” tab within that section.

Find your desired GCS bucket that you would like to sync your Rockset collection to, and then click the three dots on the right-hand side to select "Edit Bucket Permissions".

Setting up per-bucket permission

From here, select the “Add Member” button to give the service account the appropriate permissions. When adding the service account as a new member, be sure to input the full email (e.g. test@test-project.iam.gserviceaccount.com) of the account.

For a set of standard roles, you can refer to the GCP IAM permissions documentation. For example, you can use the Storage Object Viewer role that gives read access to all your GCS buckets.

You can also configure individual buckets to be accessible by the service account you created. The permissions that Rockset needs are:

  • storage.objects.get - Required to retrieve an object from Google Cloud Storage.
  • storage.objects.list - Required to list objects within a given bucket in Google Cloud Storage.

You can associate a role that provides these permissions to the service account that you created, or you can set it up for your bucket in specific.

Create a Collection

Once you have set up an integration, you can go on to create an Google Cloud Storage sourced collection. When you are creating a collection, you can choose which paths you want to include in your collection by adding multiple sources with distinct path names.

In the Rockset Console, you can create a collection from Workspace > Collections > Create Collection > Add Source > Google Cloud Storage.

Create GCS Collection

Using the CLI, you can run the following:

$ rock create collection my-gcs-collection \
    gs://my-bucket/my-path-1 \
    --integration=my-gcp-integration

Collection "my-gcs-collection" was created successfully.

Note that any of the above operations can also be performed using Rockset Client libraries or REST APIs.

Join us on Slack!
Building on Rockset? Come chat with us!