[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_query ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]
[ OFFSET count ]</div>
Returns result set from one or more collections. May include various other clauses:
with_query
is of the formalias_name AS select_query
.select_expr
is a valid SQL expression.from_item
is one of:collection_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ] [ TABLESAMPLE BERNOULLI (percentage) ]
percentage
is the percentage of input to be sampled (ranges from 0 to 100)
from_item join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
where
join_type
is one of:[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN
CROSS JOIN
(which should not be followed byON
/USING
, all other should)
condition
is a SQL expression that evaluates to a boolean. See the
section on conditional operators.grouping_element
is one of:()
expression
GROUPING SETS ( ( column [, ...] ) [, ...] )
CUBE ( column [, ...] )
ROLLUP ( column [, ...] )
select_query
is itself aSELECT
command with the same recursive format.expression
is a valid SQL expression.count
is the number of results to be returned.
Below we cover common uses for some clauses in more detail.
To return all available fields, instead of listing them out as select expressions, specify a single select expression *
.
SELECT * FROM _events
To return all available fields except some use the EXCEPT
clause in conjunction with *
.
SELECT * EXCEPT(label, type) FROM _events
To return only unique results, include DISTINCT
after SELECT
.
SELECT DISTINCT label FROM _events
To return all available subfields of an object, use the .*
notation. As with SELECT *
you may use EXCEPT
to ignore some subfields of the object.
WITH foo as (
SELECT {'a': 1, 'b': 2} x
)
SELECT foo.x.* FROM foo
WITH Clause
WITH
The WITH
clause can be used to chain queries by defining a named subquery that can be then used within the main query.
Using WITH
clauses can improve the readability and maintainability of complex queries.
WITH x AS (SELECT _id, label FROM _events)
SELECT _id, label FROM x
There can be multiple subqueries in a WITH
clause, and one can use any of the previous ones.
WITH
x AS (SELECT _id, label, kind FROM _events),
y AS (SELECT _id, kind AS c FROM x)
SELECT _id, c FROM y
FROM Clause
When querying a single collection, the FROM
clause consists simply of the collection name. However, the FROM
clause can also be used to reshape the data that is being queried.
UNNEST
UNNEST
UNNEST
is a function that can be used to expand arrays into several rows.
UNNEST
is a row generating function which takes as input an array and outputs one row per element of the input array. Unlike other functions, it outputs multiple rows (a relation) rather than one value (a scalar). That means it appears in queries in the same places a collection appears: a FROM
or JOIN
clause.
This query takes an array of 3 numbers, and produces 3 rows. An UNNEST
can have two aliases - an outer alias which names the relation, and an inner alias which names the field within the relation.
SELECT
numbers.aNumber
FROM
UNNEST(array [1, 2, 3] AS aNumber) AS numbers
We can also UNNEST
arrays of objects. We can access fields of each object with the usual dot notation.
SELECT
myFruits.someFruit.name, myFruits.someFruit.color
FROM
UNNEST(JSON_PARSE('[{"name": "strawberry", "color": "red"}, {"name": "lime", "color": "green"}]') AS someFruit) AS myFruits
UNNEST
is particularly useful when we have an array field within a collection. Suppose collection companies
contains the following documents:
{
"make": "Ford",
"models": ["Focus", "Mustang", "F-150"],
"offices": [
{"city": "Dearborn", "state": "MI"},
{"city": "Denver", "state": "CO"}
]
},
{
"make": "Toyota",
"models": ["Prius", "Highlander", "Camry"],
"offices": [
{"city": "Jacksonville", "state": "FL"},
{"city": "Ann Arbor", "state": "MI"}
]
}
For each company, we want to generate one row for each model of car. This will let us manipulate the models more easily, for instance by aggregating or filtering them. We do this by joining the companies
collection with the output of UNNEST
. For each company, UNNEST
generates one row per model. We then perform a correlated cross join. It is correlated because the input to the right side of the join depends on the current row being processed on the left side. It is a cross join because we take every output from the right side and join it with the current row from the left. There is no join condition.
SELECT
companies.make, models.*
FROM
companies
CROSS JOIN UNNEST(companies.models AS model) AS models
In SQL, we can cross join two relations implicitly by just listing them with a comma in between them, so the following query is equivalent to the above:
SELECT
companies.make, models.*
FROM
companies,
UNNEST(companies.models AS model) AS models
Note: There is an important caveat with this shorthand though. Mixing implicit joins (i.e.
SELECT * FROM t1, t2 WHERE t1.x = t2.x
) with explicit joins (i.e.
SELECT * FROM t1 JOIN t2 ON t1.x = t2.x
) can cause confusing results. If you want to unnest a
field from a collection and join with another collection, perform all joins explicitly:SELECT companies.*, models.model, makeInfo.* FROM companies CROSS JOIN UNNEST(companies.models AS model) AS models JOIN makeInfo ON companies.make = makeInfo.make
The index within the original array can also be extracted by appending WITH ORDINALITY AS alias_name
inside the UNNEST
function.
SELECT
models.*
FROM
companies,
UNNEST(companies.models AS name WITH ORDINALITY AS index) AS models
When the unnested array contains objects, the values returned by the aliased field are objects, and so they can be dereferenced using the usual dot-notation syntax.
SELECT
offices.value.state,
ARRAY_AGG(offices.value.city) AS cities
FROM
companies,
UNNEST(companies.offices AS value) AS offices
GROUP BY offices.value.state
If all elements of an array are objects, you can also omit the UNNEST alias, which allows you to query object's fields directly. Note that this doesn't work for non-object array elements; those will produce empty rows in the UNNESTed table.
SELECT
offices.state,
ARRAY_AGG(offices.city) AS cities
FROM
companies,
UNNEST(companies.offices) AS offices
GROUP BY offices.state
JOIN
ARRAY_JOIN(array, delimiter, nullReplacement)
Concatenates the elements of array
using delimiter
and an optional nullReplacement
string to replace nulls. Accepts only string types.
- An
INNER JOIN
returns only results where there is a match on both sides. - A
LEFT
(orRIGHT
)OUTER JOIN
returns all results from the left (or right) side, populating
the other side's columns when possible and withnull
otherwise. - A
CROSS JOIN
returns the Cartesian product of the two sides' results (i.e all combinations of documents in two collections).
SELECT collection1.name, collection2.name
FROM collection1 JOIN collection2
ON collection1._id = collection2._id
Cross joins can either be specified using the explicit CROSS JOIN
syntax or by specifying multiple collections in the FROM
clause. The queries below are equivalent.
SELECT collection1.id, collection2.id
FROM collection1, collection2
SELECT collection1.id, collection2.id
FROM collection1
CROSS JOIN collection2
Full outer joins are not directly supported but they can be imitated by unioning a LEFT and RIGHT JOIN as done in the example below.
SELECT collection1.name, collection2.name
FROM collection1 LEFT JOIN collection2
ON collection1._id = collection2._id
UNION
SELECT collection1.name, collection2.name
FROM collection1 RIGHT JOIN collection2
ON collection1._id = collection2._id
TABLESAMPLE
Rockset only supports BERNOULLI
sampling.
SELECT id, name FROM collection1 TABLESAMPLE BERNOULLI (50)
This returns approximately 50% of the results that would have been otherwise obtained.
SELECT avg(height) FROM people TABLESAMPLE BERNOULLI (25)
This returns the average of the height field based on approximately 25% of rows in the table.
GROUP BY Clause
Often, interesting results are obtained by grouping records together on some similar attributes.
GROUP BY
GROUP BY
is the mechanism by which the result of a SELECT
is partitioned based on a particular field.
Non-aggregated fields can only be included in aggregate functions in select expressions with the presense of a GROUP BY
clause.
A simple GROUP BY
clause may contain any expression composed of input columns.
SELECT count(*) AS c, label FROM _events GROUP BY label
HAVING
HAVING
is used to filter results, based on the output of aggregate functions.
SELECT count(*) AS c, label FROM _events
GROUP BY label
HAVING count(*) < 50
For more complex aggregations, Rockset supports GROUPING SETS
, CUBE
and ROLLUP
. These can be used to perform aggregations on multiple fields in a single query.
Functions are not supported in
GROUP BY
clause. Consider using a subquery instead.
GROUPING SETS
GROUPING SETS
Grouping sets allow users to specify multiple lists of fields to group on. The fields that are not part of a given sublist of grouping fields are set to null
. It can logically be thought of as the UNION
of a single SELECT
query with a simple GROUP BY
for each GROUPING SET
.
WITH
examples AS (
SELECT
'API_KEY_DELETED' AS label,
'API_KEY' AS kind,
UNION ALL
SELECT
'COLLECTION_CREATED',
'COLLECTION'
UNION ALL
SELECT
'QUERY_SUCCESS',
'QUERY'
UNION ALL
SELECT
'QUERY_SUCCESS',
'QUERY'
UNION ALL
SELECT
'QUERY_COLLECTION_NOT_READY',
'QUERY'
UNION ALL
SELECT
'INGEST_INFO',
'INGEST'
UNION ALL
SELECT
'INGEST_INITIALIZED',
'INGEST'
)
SELECT
GROUPING(label, kind) group_id,
label,
kind,
COUNT(*)
FROM
examples
GROUP BY
GROUPING SETS((kind), (label, kind))
ORDER BY
group_id,
label,
kind;
CUBE
CUBE
CUBE
is a higher level operator that can be used to generate GROUPING SETS
. It generates all possible GROUPING SETS
given a set of fields.
SELECT label, kind FROM _events
GROUP BY CUBE (label, kind)
This is equivalent to:
SELECT label, kind FROM _events
GROUP BY GROUPING SETS (
(label, kind),
(label),
(kind),
()
)
ROLLUP
ROLLUP
The ROLLUP
operator is similar to CUBE
in that it's a higher order abstraction over GROUPING SETS
. It generates all possible starting subsequences for a given set of fields.
SELECT label, kind FROM _events
GROUP BY ROLLUP (label, kind)
This is equivalent to:
SELECT label, kind FROM _events
GROUP BY GROUPING SETS (
(label, kind),
(label),
()
)
ORDER BY Clause
ORDER BY
The ORDER BY
clause is used to sort a result set by one or more output expressions. By default ordering for an expression is ascending. Use DESC
to get result set in descending order.
SELECT * FROM users
ORDER BY users.username DESC
Use multiple expressions, each with a different ordering, by separating with commas.
SELECT * FROM users
ORDER BY users.first_name ASC, users.last_name DESC
You can parameterize the sort order as shown in the following query. Here :asc
is the parameter for sort order: When set to true
the order will be ascending and when set to false
the order will be descending.
SELECT * FROM users
ORDER BY IF(:asc,1,-1) * users.username
You can also parameterize the attribute you would like to order by. When doing this in addition to parameterizing the sort order, you must check for and transform certain data types. This is because the parameterized sort order works by performing multiplication on the attribute.
The following query uses the same :asc
parameter from the previous query, as well as the new :sort
parameter which represent the attribute you are ordering by.
SELECT * FROM users
ORDER BY
CASE
WHEN TYPEOF(FIELD(:sort)) = 'int' THEN FIELD(:sort)
WHEN TYPEOF(FIELD(:sort)) = 'float' THEN CAST(FIELD(:sort) AS int)
WHEN TYPEOF(FIELD(:sort)) = 'bool' THEN CAST(FIELD(:sort) AS int)
WHEN TYPEOF(FIELD(:sort)) = 'string' THEN ASCII(FIELD(:sort))
WHEN TYPEOF(FIELD(:sort)) = 'null' THEN ASCII(FIELD(:sort))
WHEN TYPEOF(FIELD(:sort)) = 'u256' THEN CAST(FIELD(:sort) AS int)
WHEN TYPEOF(FIELD(:sort)) = 'date' THEN UNIX_DATE(FIELD(:sort))
WHEN TYPEOF(FIELD(:sort)) = 'datetime' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
WHEN TYPEOF(FIELD(:sort)) = 'time' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
WHEN TYPEOF(FIELD(:sort)) = 'timestamp' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
WHEN TYPEOF(FIELD(:sort)) = 'month_interval' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
WHEN TYPEOF(FIELD(:sort)) = 'microsecond_interval' THEN UNIX_DATE(CAST(FIELD(:sort) AS date))
END * IF(:asc,1,-1)
LIMIT Clause
LIMIT
The LIMIT
clause restricts the number of rows in the result.
The following query limits the result set to contain a maximum of 10 rows. The result set represents the top 10 customers based on customer rating.
SELECT * FROM customers
ORDER BY customers.customer_rating DESC
LIMIT 10
OFFSET Clause
OFFSET
The OFFSET
clause skips the first n
rows of results.
The following query selects the 6th through the 15th rows.
SELECT * FROM customers
ORDER BY customers.customer_rating DESC
LIMIT 10
OFFSET 5
CLUSTER BY Clause
CLUSTER BY
The CLUSTER BY
clause is used in an ingest transformation to configure data clustering on a collection. It is not allowed during normal query execution.
See the ingest transformation docs for more.
Combining Result Sets
ARRAY_UNION(array1, array2)
Returns a union of the two arrays, with all duplicates removed.
SELECT id, name FROM collection1
UNION
SELECT userId, name FROM collection2
ARRAY_INTERSECT(array1, array2)
Returns an intersection of the two arrays, with all duplicates removed.
SELECT id, name FROM collection1
INTERSECT
SELECT userId, name FROM collection2
ARRAY_EXCEPT(array1, array2)
Returns an array of elements in array1
but not in array2
, without duplicates.
SELECT id, name FROM collection1
EXCEPT
SELECT userId, name FROM collection2