This guide helps you get up to speed with writing queries. Rockset has fully-featured SQL support, including aggregations, filtering, windowing, and joins. See the [SQL Reference](🔗) for a comprehensive list. If you've used a SQL database like Postgres before, you should feel at home writing queries with Rockset.

Rockset has several distinct features that we cover in this guide.

  • Rockset [does not require your data to have a pre-built schema](🔗). We instead infer the schema at read time.

  • Rockset has syntax to make it [much easier to query nested data](🔗).

  • There are also some [extra tips and answers to common questions](🔗) at the end.

## Fixed schemas are not required

Rockset ingests your data without the need for a pre-built schema. Whereas every row in a MySQL or Postgres table has the same set of fields with a predefined type for each field, you can think of each Rockset document as a JSON dictionary with its own fields and values. In this sense, Rockset is dynamically typed, while traditional databases are statically typed.

Individual fields in documents are strongly typed: each value for a given field in a given document has one specific type and values of incompatible types don't automatically convert to each other. This means that you still have the same SQL query semantics as in traditional databases. If you were to query for all documents where field `x` equals the number `10`, for example, documents without that field or documents where `x` was equal to the string `'10'` would not be returned. (One other note with strong typing: queries would also throw a runtime error if you tried to evaluate an expression by passing in arguments with incompatible types: adding a datetime and a string, for example.)

### More about DESCRIBE

`DESCRIBE "collection"` allows you to see what types exist in a Rockset collection or field. The difference from a traditional database is that we infer the schema at read time by looking over the contents of your Rockset collection to derive the current shape of your data.

The output of `DESCRIBE` has the following fields:

  • **field**: Every distinct field name in the collection.

  • **type**: The data type of the field.

  • **occurrences**: The number of documents that have this field in the given type.

  • **total:** Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields.

See the [`DESCRIBE` documentation](🔗) for examples of `DESCRIBE` for different data distributions.

## Querying Nested Data

Rockset makes it much easier than usual to query highly-nested data:

  • To query arrays, use square bracket notation (`[]`). For example, `SELECT a[1]` selects the first entry in the array field `a`.

  • To query objects, use dot notation (`.`). For example, `SELECT a.b` selects the value of the key `b` in the object field `a`.

In any nested field path, Rockset assumes the first token is the collection name, and will return an error if missing. So for example, to access field `a.b` in collection ``, refer to the field as `` in the query.

One other special command that's very useful when working with arrays is `UNNEST`. `UNNEST` treats each entry in an array value as a separate row. Or, more formally, UNNEST is a row-generating function that takes an array as input and outputs one row per element of the input array. It's different than other functions in that it outputs multiple rows (a relation) rather than one value (a scalar). That means it's used in queries in the same places a collection appears: a `FROM` or `JOIN` clause.

Refer to our [`UNNEST` documentation](🔗) for more details.


(Script tags will be stripped)

## More Advice

  • Remembering which quotes to use is always confusing if you don't write SQL regularly. Single quotes are used for strings, and double quotes are used for field and collection names. A useful mnemonic is: "**S**ingle quotes are for **S**trings, **D**ouble quotes are for **D**atabase identifiers"

  • If you're writing a complex query, the `WITH` statement can be very useful in breaking it down into something more readable and modular.

  • When certain characters, such as `-`, are part of collection org field names the entire name must always be wrapped in double quotes. For example, `SELECT id, age, "full-name" from "people-list"`

  • Rockset is strongly typed, and will evaluate to false if you attempt to compare values of different types (the way the string '10' would not equal the integer 10 in many programming languages). Depending on your goal, you could write a query that casts everything to the same type: `SELECT field FROM c where cast(field as int) = 10`, or one that only fetches rows that match a specific type `SELECT field FROM c WHERE field = '10'`

You can inspect what types exist in a collection or field in the collection's console page or with a call to `DESCRIBE "collection"`.

  • As mentioned above, in any nested field path, Rockset assumes the first token is the collection name. To access field `a.b` in collection ``, refer to the field as `` in the query.

  • SQL has two different syntaxes for joining tables: an implicit syntax that just uses the FROM clause `FROM table, table2`, and an explicit syntax of the form `FROM table JOIN table2 ON ...`. Either works, but try not to mix them both in the same query (it will work, but error messages will be confusing.)

  • There is a full list of SQL commands and functions in the [SQL Reference](🔗)