Data Sources > Google Cloud Storage

Google Cloud Storage

This page covers how to use a Google Cloud Storage bucket as a data source in Rockset. This includes:

  • Creating a Google Cloud Storage integration to securely connect buckets in your GCP account with Rockset.
  • Creating a collection which syncs your data from a Google Cloud Storage bucket into Rockset.

For the following steps, you must have access to a Google Cloud account and be able to manage Google Cloud Service Accounts and Roles. If you do not have access, please invite your GCP account administrator to Rockset.

Creating a Google Cloud Storage Integration

These instructions explain how to create and permission a GCP Service Account. Once you complete these steps, you can use the JSON key associated with the service account to create the Rockset integration in the Rockset console.

Creating a GCP Service Account

You can create a new service account to provide Rockset with access to your GCP resources. The instructions on creating a new service account are covered in detail in GCP documentation.

Create GCP Service Account

Once you create the service account, create a new key and ensure sure that you download the JSON associated with that key. This key is required in order to create the GCP integration within Rockset Console.

Create GCP Service Account

Setting up permissions

In order to access Google Cloud Storage buckets, you must provide roles to the service account that allow access to specific buckets. For a set of standard roles, you can refer to the GCP IAM permissions documentation. For example, you can use the Storage Object Viewer role that gives read access to all your GCS buckets.

You can also configure individual buckets to be accessible by the service account you created. The permissions that Rockset needs are:

  • storage.objects.get - Required to retrieve an object from Google Cloud Storage.
  • storage.objects.list - Required to list objects within a given bucket in Google Cloud Storage.

You can associate a role that provides these permissions to the service account that you created, or you can set it up for your bucket in specific.

Setting up per-bucket permission

Create a Collection

Once you have set up an integration, you can go on to create an Google Cloud Storage sourced collection. When you are creating a collection, you can choose which paths you want to include in your collection by adding multiple sources with distinct path names.

In the Rockset Console, you can create a collection from Workspace > Collections > Create Collection > Add Source > Google Cloud Storage.

Create GCS Collection

Using the CLI, you can run the following:

$ rock create collection my-gcs-collection \
    gs://my-bucket/my-path-1 \
    --integration=my-gcp-integration

Collection "my-gcs-collection" was created successfully.

Note that any of the above operations can also be performed using Rockset Client libraries or REST APIs.