SQL Guide
This guide helps you get up to speed with writing queries. Rockset has fully-featured SQL support, including aggregations, filtering, windowing, and joins. See the SQL Reference for a comprehensive list. If you've used a SQL database like Postgres before, you should feel at home writing queries with Rockset.
Rockset has several distinct features that we cover in this guide:
- Rockset does not require your data to have a pre-built schema. We instead infer the schema at read time.
- Rockset has syntax to make it much easier to query nested data.
- There are also some extra tips and answers to common questions at the end.
Fixed schemas are not required
Rockset ingests your data without the need for a pre-built schema. Whereas every row in a MySQL or Postgres table has the same set of fields with a predefined type for each field, you can think of each Rockset document as a JSON dictionary with its own fields and values. In this sense, Rockset is dynamically typed, while traditional databases are statically typed.
Individual fields in documents are strongly typed: each value for a given field in a given document has one specific type and values of incompatible types don't automatically convert to each other. This means that you still have the same SQL query semantics as in traditional databases. If you were to query for all documents where field x
equals the number 10
, for example, documents without that field or documents where x
was equal to the string '10'
would not be returned. (One other note with strong typing: queries would also throw a runtime error if you tried to evaluate an expression by passing in arguments with incompatible types: adding a datetime and a string, for example.)
More about DESCRIBE
DESCRIBE "collection"
allows you to see what types exist in a Rockset Collection or field. The difference from a traditional database is that we infer the schema at read time by looking over the contents of your Rockset collection to derive the current shape of your data.
The output of DESCRIBE
has the following fields:
- field: Every distinct field name in the collection.
- type: The data type of the field.
- occurrences: The number of documents that have this field in the given type.
- total: Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields.
See the DESCRIBE
documentation for examples of DESCRIBE
for different data distributions.
Querying Nested Data
Rockset makes it much easier than usual to query highly-nested data:
- To query arrays, use square bracket notation (
[]
). For example,SELECT a[1]
selects the first entry in the array fielda
. - To query objects, use dot notation (
.
). For example,SELECT a.b
selects the value of the keyb
in the object fielda
.
In any nested field path, Rockset assumes the first token is the collection name, and will return an error if missing. So for example, to access field a.b
in collection commons.foo
, refer to the field as foo.a.b
in the query.
One other special command that's very useful when working with arrays is UNNEST
. UNNEST
treats each entry in an array value as a separate row. Or, more formally, UNNEST is a row-generating function that takes an array as input and outputs one row per element of the input array. It's different than other functions in that it outputs multiple rows (a relation) rather than one value (a scalar). That means it's used in queries in the same places a collection appears: a FROM
or JOIN
clause.
Refer to our UNNEST
documentation for more details.
Example:
SELECT * FROM UNNEST(array ['Alice', 'Bob'] AS name)
More Advice
-
Remembering which quotes to use is always confusing if you don't write SQL regularly. Single quotes are used for strings, and double quotes are used for field and collection names. A useful mnemonic is: "Single quotes are for Strings, Double quotes are for Database identifiers"
-
If you're writing a complex query, the
WITH
statement can be very useful in breaking it down into something more readable and modular.
WITH recent_users AS (
SELECT ... from giant_user_database ...
),
critical_app_events AS (
SELECT ... from event_stream ...
)
SELECT
...
FROM recent_users
LEFT JOIN critical_app_events ...
WHERE ...
-
When certain characters, such as
-
, are part of collection org field names the entire name must always be wrapped in double quotes. For example,SELECT id, age, "full-name" from "people-list"
-
Rockset is strongly typed, and will evaluate to false if you attempt to compare values of different types (the way the string '10' would not equal the integer 10 in many programming languages). Depending on your goal, you could write a query that casts everything to the same type:
SELECT field FROM c where cast(field as int) = 10
, or one that only fetches rows that match a specific typeSELECT field FROM c WHERE field = '10'
You can inspect what types exist in a collection or field in the collection's console page or with a call to DESCRIBE "collection"
.
-
As mentioned above, in any nested field path, Rockset assumes the first token is the collection name. To access field
a.b
in collectioncommons.foo
, refer to the field ascommons.foo.a.b
in the query. -
SQL has two different syntaxes for joining tables: an implicit syntax that just uses the FROM clause
FROM table, table2
, and an explicit syntax of the formFROM table JOIN table2 ON ...
. Either works, but try not to mix them both in the same query (it will work, but error messages will be confusing.) -
There is a full list of SQL commands and functions in the SQL Reference
Updated 6 months ago