This page covers how to use a Google Cloud Storage bucket as a data source in Rockset. This includes:
For the following steps, you must have access to a Google Cloud account and be able to manage Google Cloud Service Accounts and Roles. If you do not have access, please invite your GCP account administrator to Rockset.
These instructions explain how to create and permission a GCP Service Account. Once you complete these steps, you can use the JSON key associated with the service account to create the Rockset integration in the Rockset console.
You can create a new service account to provide Rockset with access to your GCP resources. The instructions on creating a new service account are covered in detail in GCP documentation.
Once you create the service account, create a new key and ensure sure that you download the JSON associated with that key. This key is required in order to create the GCP integration within Rockset Console.
In order to access Google Cloud Storage buckets, you must provide roles to the service account that allow access to specific buckets. For a set of standard roles, you can refer to the GCP IAM permissions documentation. For example, you can use the
Storage Object Viewer role that gives read access to all your GCS buckets.
You can also configure individual buckets to be accessible by the service account you created. The permissions that Rockset needs are:
storage.objects.get- Required to retrieve an object from Google Cloud Storage.
storage.objects.list- Required to list objects within a given bucket in Google Cloud Storage.
You can associate a role that provides these permissions to the service account that you created, or you can set it up for your bucket in specific.
Once you have set up an integration, you can go on to create an Google Cloud Storage sourced collection. When you are creating a collection, you can choose which paths you want to include in your collection by adding multiple sources with distinct path names.
In the Rockset Console, you can create a collection from Workspace > Collections > Create Collection > Add Source > Google Cloud Storage.
Using the CLI, you can run the following:
$ rock create collection my-gcs-collection \ gs://my-bucket/my-path-1 \ --integration=my-gcp-integration Collection "my-gcs-collection" was created successfully.
Note that any of the above operations can also be performed using Rockset Client libraries or REST APIs.