This page describes how to create collections from CSV files.

Rockset can parse raw CSV data

#Using Console

In this section we will create a collection from a dataset hosted on AWS S3. Click on Create Collection in the Overview tab to begin.

Choose an appropriate name, description (optional) and select Amazon S3 as source from the Add Source dropdown. Provide the AWS S3 bucket name, prefix (if any) and select the integration under the Integration Name dropdown or choose None if the bucket is public.

Select 'CSV' from the Format dropdown, which will show a few more options to be configured for CSV format support. Configure them as follows:

  • Header

    • First line of file as column names - Select this option if the CSV source contains column names in the first line
    • Specify Columns manually - Select this option if you want to provide custom names for each column in the CSV data source. This option will ask you to provide a name and datatype for each column
    • Generate column names automatically - Rockset will automatically generate unique column names (c1, c2, ..) for the CSV data source
  • Separator - The separator used in the CSV data source (default value is Comma)

  • Encoding - Select the encoding format. Supported encodings are UTF-8, UTF-16, ISO 8859-1

  • Quote Character - A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters (default value is ")

Click Create on the top right to create the collection. You should see a new collection in state Created and it can take up to a minute for the collection to become Ready.

