This page describes how to create collections from Excel spreadsheets (.XLS or .XLSX files).
When a XLS/XLSX file is ingested into a Rockset collection, each row of each spreadsheet is parsed as one document. Each document will have the following fields, based on the row it corresponds to:
rownum
- the index of the row (starting at 1)sheet.number
- the index of the sheet the row is in (starting at 1)sheet.name
- the name of the sheet the row is inAdditionally, Rockset will add a field for each of the row’s cells, with field name corresponding to the column’s header (i.e., "A", "B", and so on).
For example, a collection that ingested data from Excel files could have the schema shown below.
> DESCRIBE spreadsheet_data
+--------------------------------------------+---------------+---------+-----------+
| field | occurrences | total | type |
|--------------------------------------------+---------------+---------+-----------|
| ['B'] | 1 | 10860 | string |
| ['B'] | 5384 | 10860 | float |
| ['C'] | 10856 | 10860 | string |
| ['D'] | 10855 | 10860 | string |
| ['E'] | 10852 | 10860 | string |
| ['F'] | 10852 | 10860 | string |
| ['G'] | 5468 | 10860 | string |
| ['G'] | 5385 | 10860 | float |
| ['_event_time'] | 10860 | 10860 | timestamp |
| ['_id'] | 10860 | 10860 | string |
| ['_meta'] | 10860 | 10860 | object |
| ['_meta', 'file_upload'] | 10860 | 10860 | object |
| ['_meta', 'file_upload', 'file'] | 10860 | 10860 | string |
| ['_meta', 'file_upload', 'file_upload_id'] | 10860 | 10860 | string |
| ['_meta', 'file_upload', 'upload_time'] | 10860 | 10860 | string |
| ['rownum'] | 10860 | 10860 | int |
| ['sheet'] | 10860 | 10860 | object |
| ['sheet', 'name'] | 10860 | 10860 | string |
| ['sheet', 'number'] | 10860 | 10860 | int |
+--------------------------------------------+---------------+---------+-----------+
To create a collection based on an XLS file, specify the format as XLS
. For example, the command below creates a collection based on an S3 bucket containing spreadsheet files.
$ rock create collection spreadsheet_data s3://example/excel-files --format CSV
Collection "spreadsheet_data" was created successfully.
To configure a collection to ingest XLS data, select the “XLS/XLSX” option as the format.
Note that you can also upload an Excel file into an existing collection: