SELECT

[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_query ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]
[ OFFSET count ]</div>

Returns result set from one or more collections. May include various other clauses:

  • with_query is of the form alias_name AS select_query.
  • select_expr is a valid SQL expression.
  • from_item is one of:
    • collection_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ] [ TABLESAMPLE BERNOULLI (percentage) ]
      • percentage is the percentage of input to be sampled (ranges from 0 to 100)
    • from_item join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ] where
      join_type is one of:
      • [ INNER ] JOIN
      • LEFT [ OUTER ] JOIN
      • RIGHT [ OUTER ] JOIN
      • CROSS JOIN (which should not be followed by ON/USING, all other should)
  • condition is a SQL expression that evaluates to a boolean. See the
    section on conditional operators.
  • grouping_element is one of:
    • ()
    • expression
    • GROUPING SETS ( ( column [, ...] ) [, ...] )
    • CUBE ( column [, ...] )
    • ROLLUP ( column [, ...] )
  • select_query is itself a SELECT command with the same recursive format.
  • expression is a valid SQL expression.
  • count is the number of results to be returned.

Below we cover common uses for some clauses in more detail.

To return all available fields, instead of listing them out as select expressions, specify a single select expression *.

SELECT * FROM _events

+--------------------------------------+---------------+--------+---------------+------+ | _id | eventTime | kind | label | type | +--------------------------------------+---------------+--------+---------------+------+ | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | 1536164080248 | QUERY | QUERY_SUCCESS | INFO | | 4a5e941e-7683-4db2-b8e0-ee0e4807d6a7 | 1536164079995 | QUERY | QUERY_SUCCESS | INFO | | 6a9c5d49-ee3b-4b78-b653-006c540f70c3 | 1536162113586 | QUERY | QUERY_ERROR | INFO | +--------------------------------------+---------------+--------+---------------+------+

To return all available fields except some use the EXCEPT clause in conjunction with *.

SELECT * EXCEPT(label, type) FROM _events

+--------------------------------------+---------------+--------+ | _id | eventTime | kind | +--------------------------------------+---------------+--------+ | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | 1536164080248 | QUERY | | 4a5e941e-7683-4db2-b8e0-ee0e4807d6a7 | 1536164079995 | QUERY | | 6a9c5d49-ee3b-4b78-b653-006c540f70c3 | 1536162113586 | QUERY | +--------------------------------------+---------------+--------+

To return only unique results, include DISTINCT after SELECT.

SELECT DISTINCT label FROM _events

+---------------+ | label | +---------------+ | QUERY_SUCCESS | | QUERY_ERROR | +---------------+

To return all available subfields of an object, use the .* notation. As with SELECT * you may use EXCEPT to ignore some subfields of the object.

WITH foo as (
    SELECT {'a': 1, 'b': 2} x
)
SELECT foo.x.* FROM foo
+---+---+ | a | b | +---+---| | 1 | 2 | +---+---+

WITH Clause

WITH The WITH clause can be used to chain queries by defining a named subquery that can be then used within the main query.

Using WITH clauses can improve the readability and maintainability of complex queries.

WITH x AS (SELECT _id, label FROM _events)
SELECT _id, label FROM x
+--------------------------------------+---------------+ | _id | label | |--------------------------------------+---------------| | 23658547-1ae5-4be8-97e5-6e043f682c31 | QUERY_SUCCESS | | 85e16122-d749-444d-ab63-bb5a0544368b | QUERY_SUCCESS | | 972485f2-e3b0-43c4-bb7e-c70b7df3e86c | QUERY_SUCCESS | +--------------------------------------+---------------+

There can be multiple subqueries in a WITH clause, and one can use any of the previous ones.

WITH
    x AS (SELECT _id, label, kind FROM _events),
    y AS (SELECT _id, kind AS c FROM x)
SELECT _id, c FROM y
+--------------------------------------+-------+ | _id | c | |--------------------------------------+-------| | 589eaee9-0c62-46f9-8f7a-0a5a3fcf2e2c | QUERY | | 3e2e99b9-896f-4852-9669-ab5d1a1a6ebf | QUERY | | a2b5d818-cad6-49b8-8f9d-bcc77146e060 | QUERY | +--------------------------------------+-------+

FROM Clause

When querying a single collection, the FROM clause consists simply of the collection name. However, the FROM clause can also be used to reshape the data that is being queried.

UNNEST

UNNEST UNNEST is a function that can be used to expand arrays into several rows.

UNNEST is a row generating function which takes as input an array and outputs one row per element of the input array. Unlike other functions, it outputs multiple rows (a relation) rather than one value (a scalar). That means it appears in queries in the same places a collection appears: a FROM or JOIN clause.

This query takes an array of 3 numbers, and produces 3 rows. An UNNEST can have two aliases - an outer alias which names the relation, and an inner alias which names the field within the relation.

SELECT
    numbers.aNumber
FROM
    UNNEST(array [1, 2, 3] AS aNumber) AS numbers
+---------+ | aNumber | |---------| | 1 | | 2 | | 3 | +---------+

We can also UNNEST arrays of objects. We can access fields of each object with the usual dot notation.

SELECT
    myFruits.someFruit.name, myFruits.someFruit.color
FROM
    UNNEST(JSON_PARSE('[{"name": "strawberry", "color": "red"}, {"name": "lime", "color": "green"}]') AS someFruit) AS myFruits
+------------+---------+ | name | color | |------------+---------| | strawberry | red | | lime | green | +------------+---------+

UNNEST is particularly useful when we have an array field within a collection. Suppose collection companies contains the following documents:

{
    "make": "Ford",
    "models": ["Focus", "Mustang", "F-150"],
    "offices": [
        {"city": "Dearborn", "state": "MI"},
        {"city": "Denver", "state": "CO"}
    ]
},
{
    "make": "Toyota",
    "models": ["Prius", "Highlander", "Camry"],
    "offices": [
        {"city": "Jacksonville", "state": "FL"},
        {"city": "Ann Arbor", "state": "MI"}
    ]
}

For each company, we want to generate one row for each model of car. This will let us manipulate the models more easily, for instance by aggregating or filtering them. We do this by joining the companies collection with the output of UNNEST. For each company, UNNEST generates one row per model. We then perform a correlated cross join. It is correlated because the input to the right side of the join depends on the current row being processed on the left side. It is a cross join because we take every output from the right side and join it with the current row from the left. There is no join condition.

SELECT
    companies.make, models.*
FROM
    companies
    CROSS JOIN UNNEST(companies.models AS model) AS models
+---------+------------+ | make | model | |---------+------------| | Ford | Focus | | Ford | Mustang | | Ford | F-150 | | Toyota | Prius | | Toyota | Highlander | | Toyota | Camry | +---------+------------+

In SQL, we can cross join two relations implicitly by just listing them with a comma in between them, so the following query is equivalent to the above:

SELECT
    companies.make, models.*
FROM
    companies,
    UNNEST(companies.models AS model) AS models

Note: There is an important caveat with this shorthand though. Mixing implicit joins (i.e.
SELECT * FROM t1, t2 WHERE t1.x = t2.x) with explicit joins (i.e.
SELECT * FROM t1 JOIN t2 ON t1.x = t2.x) can cause confusing results. If you want to unnest a
field from a collection and join with another collection, perform all joins explicitly:

SELECT
  companies.*, models.model, makeInfo.*
FROM companies
CROSS JOIN UNNEST(companies.models AS model) AS models
JOIN makeInfo ON companies.make = makeInfo.make

The index within the original array can also be extracted by appending WITH ORDINALITY AS alias_name inside the UNNEST function.

SELECT
    models.*
FROM
    companies,
    UNNEST(companies.models AS name WITH ORDINALITY AS index) AS models

+---------+------------+ | index | name | |---------+------------| | 1 | Prius | | 2 | Highlander | | 3 | Camry | | 1 | Focus | | 2 | Mustang | | 3 | F-150 | +---------+------------+

When the unnested array contains objects, the values returned by the aliased field are objects, and so they can be dereferenced using the usual dot-notation syntax.

SELECT
    offices.value.state,
    ARRAY_AGG(offices.value.city) AS cities
FROM
    companies,
    UNNEST(companies.offices AS value) AS offices
GROUP BY offices.value.state

+---------------------------+---------+ | cities | state | |---------------------------+---------| | ['Denver'] | CO | | ['Jacksonville'] | FL | | ['Dearborn', 'Ann Arbor'] | MI | +---------------------------+---------+

If all elements of an array are objects, you can also omit the UNNEST alias, which allows you to query object's fields directly. Note that this doesn't work for non-object array elements; those will produce empty rows in the UNNESTed table.

SELECT
    offices.state,
    ARRAY_AGG(offices.city) AS cities
FROM
    companies,
    UNNEST(companies.offices) AS offices
GROUP BY offices.state

+---------------------------+---------+ | cities | state | |---------------------------+---------| | ['Denver'] | CO | | ['Jacksonville'] | FL | | ['Dearborn', 'Ann Arbor'] | MI | +---------------------------+---------+

JOIN

ARRAY_JOIN(array, delimiter, nullReplacement) Concatenates the elements of array using delimiter and an optional nullReplacement string to replace nulls. Accepts only string types.

  • An INNER JOIN returns only results where there is a match on both sides.
  • A LEFT (or RIGHT) OUTER JOIN returns all results from the left (or right) side, populating
    the other side's columns when possible and with null otherwise.
  • A CROSS JOIN returns the Cartesian product of the two sides' results (i.e all combinations of documents in two collections).
SELECT collection1.name, collection2.name
FROM collection1 JOIN collection2
ON collection1._id = collection2._id

Cross joins can either be specified using the explicit CROSS JOIN syntax or by specifying multiple collections in the FROM clause. The queries below are equivalent.

SELECT collection1.id, collection2.id
FROM collection1, collection2
SELECT collection1.id, collection2.id
FROM collection1
CROSS JOIN collection2

Full outer joins are not directly supported but they can be imitated by unioning a LEFT and RIGHT JOIN as done in the example below.

SELECT collection1.name, collection2.name
FROM collection1 LEFT JOIN collection2
ON collection1._id = collection2._id
UNION 
SELECT collection1.name, collection2.name
FROM collection1 RIGHT JOIN collection2
ON collection1._id = collection2._id

TABLESAMPLE

Rockset only supports BERNOULLI sampling.

SELECT id, name FROM collection1 TABLESAMPLE BERNOULLI (50)

This returns approximately 50% of the results that would have been otherwise obtained.

SELECT avg(height) FROM people TABLESAMPLE BERNOULLI (25)

This returns the average of the height field based on approximately 25% of rows in the table.

GROUP BY Clause

Often, interesting results are obtained by grouping records together on some similar attributes.

GROUP BY GROUP BY is the mechanism by which the result of a SELECT is partitioned based on a particular field.

Non-aggregated fields can only be included in aggregate functions in select expressions with the presense of a GROUP BY clause.

A simple GROUP BY clause may contain any expression composed of input columns.

SELECT count(*) AS c, label FROM _events GROUP BY label

+--------+------------------------------+ | c | label | |--------+------------------------------| | 41 | API_KEY_DELETED | | 35 | API_KEY_CREATED | | 88256 | QUERY_SUCCESS | | 133 | QUERY_INVALID | | 24 | API_KEY_ERROR | | 72 | INGEST_WARNING | | 16 | COLLECTION_DROPPED | | 121988 | INGEST_INFO | +--------+------------------------------+

HAVING

HAVING is used to filter results, based on the output of aggregate functions.

SELECT count(*) AS c, label FROM _events
GROUP BY label
HAVING count(*) < 50
+--------+------------------------------+ | c | label | |--------+------------------------------| | 41 | API_KEY_DELETED | | 35 | API_KEY_CREATED | | 24 | API_KEY_ERROR | | 16 | COLLECTION_DROPPED | +--------+------------------------------+

For more complex aggregations, Rockset supports GROUPING SETS, CUBE and ROLLUP. These can be used to perform aggregations on multiple fields in a single query.

Functions are not supported in GROUP BY clause. Consider using a subquery instead.

GROUPING SETS

GROUPING SETS Grouping sets allow users to specify multiple lists of fields to group on. The fields that are not part of a given sublist of grouping fields are set to null. It can logically be thought of as the UNION of a single SELECT query with a simple GROUP BY for each GROUPING SET.

WITH
    examples AS (
        SELECT
            'API_KEY_DELETED' AS label,
            'API_KEY' AS kind,
        UNION ALL
        SELECT
            'COLLECTION_CREATED',
            'COLLECTION'
        UNION ALL
        SELECT
            'QUERY_SUCCESS',
            'QUERY'
        UNION ALL
        SELECT
            'QUERY_SUCCESS',
            'QUERY'
        UNION ALL
        SELECT
            'QUERY_COLLECTION_NOT_READY',
            'QUERY'
        UNION ALL
        SELECT
            'INGEST_INFO',
            'INGEST'
        UNION ALL
        SELECT
            'INGEST_INITIALIZED',
            'INGEST'
    )
SELECT
    GROUPING(label, kind) group_id,
    label,
    kind,
    COUNT(*)
FROM
    examples
GROUP BY
    GROUPING SETS((kind), (label, kind))
ORDER BY
    group_id,
    label,
    kind;
+----------+------------------------------+--------------+--------+ | group_id | label | kind | ?COUNT | +----------+------------------------------+--------------+--------+ | 0 | "API_KEY_DELETED" | "API_KEY" | 1 | | 0 | "COLLECTION_CREATED" | "COLLECTION" | 1 | | 0 | "INGEST_INFO" | "INGEST" | 1 | | 0 | "INGEST_INITIALIZED" | "INGEST" | 1 | | 0 | "QUERY_COLLECTION_NOT_READY" | "QUERY" | 1 | | 0 | "QUERY_SUCCESS" | "QUERY" | 2 | | 2 | null | "API_KEY" | 1 | | 2 | null | "COLLECTION" | 1 | | 2 | null | "INGEST" | 2 | | 2 | null | "QUERY" | 3 | +----------+------------------------------+--------------+--------+

CUBE

CUBE CUBE is a higher level operator that can be used to generate GROUPING SETS. It generates all possible GROUPING SETS given a set of fields.

SELECT label, kind FROM _events
GROUP BY CUBE (label, kind)

This is equivalent to:

SELECT label, kind FROM _events
GROUP BY GROUPING SETS (
    (label, kind),
    (label),
    (kind),
    ()
)

ROLLUP

ROLLUP The ROLLUP operator is similar to CUBE in that it's a higher order abstraction over GROUPING SETS. It generates all possible starting subsequences for a given set of fields.

SELECT label, kind FROM _events
GROUP BY ROLLUP (label, kind)

This is equivalent to:

SELECT label, kind FROM _events
GROUP BY GROUPING SETS (
    (label, kind),
    (label),
    ()
)

ORDER BY Clause

ORDER BY The ORDER BY clause is used to sort a result set by one or more output expressions. By default ordering for an expression is ascending. Use DESC to get result set in descending order.

SELECT * FROM users
ORDER BY users.username DESC

Use multiple expressions, each with a different ordering, by separating with commas.

SELECT * FROM users
ORDER BY users.first_name ASC, users.last_name DESC

You can parameterize the sort order as shown in the following query. Here :asc is the parameter for sort order: When set to true the order will be ascending and when set to false the order will be descending.

SELECT * FROM users
ORDER BY IF(:asc,1,-1) * users.username

You can also parameterize the attribute you would like to order by. When doing this in addition to parameterizing the sort order, you must check for and transform certain data types. This is because the parameterized sort order works by performing multiplication on the attribute.

The following query uses the same :asc parameter from the previous query, as well as the new :sort parameter which represent the attribute you are ordering by.

SELECT * FROM users
ORDER BY
    CASE
        WHEN TYPEOF(FIELD(:sort)) = 'int' THEN FIELD(:sort)
				WHEN TYPEOF(FIELD(:sort)) = 'float' THEN CAST(FIELD(:sort) AS int)
				WHEN TYPEOF(FIELD(:sort)) = 'bool' THEN CAST(FIELD(:sort) AS int)
        WHEN TYPEOF(FIELD(:sort))  = 'string' THEN ASCII(FIELD(:sort))
				WHEN TYPEOF(FIELD(:sort)) = 'null' THEN ASCII(FIELD(:sort))
				WHEN TYPEOF(FIELD(:sort)) = 'u256' THEN CAST(FIELD(:sort) AS int)
				WHEN TYPEOF(FIELD(:sort))  = 'date' THEN UNIX_DATE(FIELD(:sort))
				WHEN TYPEOF(FIELD(:sort)) = 'datetime' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
				WHEN TYPEOF(FIELD(:sort)) = 'time' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
				WHEN TYPEOF(FIELD(:sort)) = 'timestamp' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
				WHEN TYPEOF(FIELD(:sort)) = 'month_interval' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
				WHEN TYPEOF(FIELD(:sort)) = 'microsecond_interval' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
    END * IF(:asc,1,-1)

LIMIT Clause

LIMIT The LIMIT clause restricts the number of rows in the result.

The following query limits the result set to contain a maximum of 10 rows. The result set represents the top 10 customers based on customer rating.

SELECT * FROM customers
ORDER BY customers.customer_rating DESC
LIMIT 10

OFFSET Clause

OFFSET The OFFSET clause skips the first n rows of results.

The following query selects the 6th through the 15th rows.

SELECT * FROM customers
ORDER BY customers.customer_rating DESC
LIMIT 10
OFFSET 5

CLUSTER BY Clause

CLUSTER BY The CLUSTER BY clause is used in an ingest transformation to configure data clustering on a collection. It is not allowed during normal query execution.

See the ingest transformation docs for more.

Combining Result Sets

ARRAY_UNION(array1, array2) Returns a union of the two arrays, with all duplicates removed.

SELECT id, name FROM collection1
UNION
SELECT userId, name FROM collection2

ARRAY_INTERSECT(array1, array2) Returns an intersection of the two arrays, with all duplicates removed.

SELECT id, name FROM collection1
INTERSECT
SELECT userId, name FROM collection2

ARRAY_EXCEPT(array1, array2) Returns an array of elements in array1 but not in array2, without duplicates.

SELECT id, name FROM collection1
EXCEPT
SELECT userId, name FROM collection2