Data Format > Spreadsheet Data

Spreadsheet Data

This page describes how to create collections from Excel spreadsheets (.XLS or .XLSX files).

Document Format

When a XLS/XLSX file is ingested into a Rockset collection, each row of each spreadsheet is parsed as one document. Each document will have the following fields, based on the row it corresponds to:

  • rownum - the index of the row (starting at 1)
  • sheet.number - the index of the sheet the row is in (starting at 1)
  • sheet.name - the name of the sheet the row is in

Additionally, Rockset will add a field for each of the row’s cells, with field name corresponding to the column’s header (i.e., "A", "B", and so on).

For example, a collection that ingested data from Excel files could have the schema shown below.

> DESCRIBE spreadsheet_data

+--------------------------------------------+---------------+---------+-----------+
| field                                      | occurrences   | total   | type      |
|--------------------------------------------+---------------+---------+-----------|
| ['B']                                      | 1             | 10860   | string    |
| ['B']                                      | 5384          | 10860   | float     |
| ['C']                                      | 10856         | 10860   | string    |
| ['D']                                      | 10855         | 10860   | string    |
| ['E']                                      | 10852         | 10860   | string    |
| ['F']                                      | 10852         | 10860   | string    |
| ['G']                                      | 5468          | 10860   | string    |
| ['G']                                      | 5385          | 10860   | float     |
| ['_event_time']                            | 10860         | 10860   | timestamp |
| ['_id']                                    | 10860         | 10860   | string    |
| ['_meta']                                  | 10860         | 10860   | object    |
| ['_meta', 'file_upload']                   | 10860         | 10860   | object    |
| ['_meta', 'file_upload', 'file']           | 10860         | 10860   | string    |
| ['_meta', 'file_upload', 'file_upload_id'] | 10860         | 10860   | string    |
| ['_meta', 'file_upload', 'upload_time']    | 10860         | 10860   | string    |
| ['rownum']                                 | 10860         | 10860   | int       |
| ['sheet']                                  | 10860         | 10860   | object    |
| ['sheet', 'name']                          | 10860         | 10860   | string    |
| ['sheet', 'number']                        | 10860         | 10860   | int       |
+--------------------------------------------+---------------+---------+-----------+

Creating a Collection

Using CLI

To create a collection based on an XLS file, specify the format as XLS. For example, the command below creates a collection based on an S3 bucket containing spreadsheet files.

$ rock create collection spreadsheet_data s3://example/excel-files --format CSV

Collection "spreadsheet_data" was created successfully.

Using Console

To configure a collection to ingest XLS data, select the “XLS/XLSX” option as the format.

Console Create XLS

Note that you can also upload an Excel file into an existing collection:

Console Upload XLS