General Aggregations

ARRAY_AGG

ARRAY_AGG(x) Returns an array created from all the elements in x.

This function accepts a set of values and returns an array in which each value in the set is assigned to an element of the array.

Use LIMIT to enforce the maximum number of elements in the array. Use ORDER BY to specify how elements should be ordered in the array. Use OFFSET to add an offset to the ordered array result.

WITH example AS (
      SELECT 'red' AS color, 'primary' AS type
      UNION ALL SELECT 'blue', 'primary'
      UNION ALL SELECT 'yellow', 'secondary'
      UNION ALL SELECT 'green', 'primary'
      UNION ALL SELECT 'white', 'mix'
      UNION ALL SELECT 'black', null
  )
  SELECT
      t.type,
      ARRAY_AGG(t.color) AS all_colors
  FROM
      example t
  GROUP BY
      t.type
+-----------+--------------------------+ | type | all_colors | |-----------+--------------------------| | primary | ["red", "blue", "green"] | | mix | ["white"] | | secondary | ["yellow"] | | null | ["black"] | +-----------+--------------------------+
WITH example AS (
      SELECT 'blue' AS color, 'primary' AS type, undefined AS weight,
      UNION ALL
      SELECT 'red', 'primary', 5
      UNION ALL
      SELECT 'yellow', 'secondary', 8
      UNION ALL
      SELECT 'green', 'primary', 1
      UNION ALL
      SELECT 'white', 'mix', 9
      UNION ALL
      SELECT 'purple', 'mix', 10
      UNION ALL
      SELECT 'black', undefined, 3
  )
  SELECT ARRAY_AGG(
      color
      ORDER BY
        type ASC NULLS FIRST,
        weight DESC
      LIMIT 5
      OFFSET 1
  ) AS all_colors FROM example
+---------------------------------------------+ | all_colors | |---------------------------------------------| | ["purple", "white", "blue", "red", "green"] | +---------------------------------------------+
WITH example AS (
      SELECT 'red' AS color
      UNION ALL SELECT 'blue'
      UNION ALL SELECT 'yellow'
      UNION ALL SELECT null
  )
SELECT
    ARRAY_AGG(e.color except nulls)
FROM
    example e
["red", "blue", "yellow"]

MAP_AGG

MAP_AGG(keys, values) Creates an object where the keys are from the first input and the values are from the second input.

WITH fruits AS (
      SELECT 'apple' as key, 'red' as value
      UNION ALL SELECT 'banana' AS key, 'yellow' as value
)
SELECT
    MAP_AGG(key, value)
FROM
    fruits
{"apple":"red","banana":"yellow"}

MAP_AGG can also be used as a Counter function in the example below.

WITH fruits AS (
    SELECT
        'apple' AS key,
    UNION ALL
    SELECT
        'banana'
    UNION ALL
    SELECT
        'apple'
)
SELECT
    MAP_AGG(x.key, x.counter)
FROM
    (SELECT key, COUNT(key) AS counter FROM fruits GROUP BY 1) x
{"apple":2,"banana":1}

ARBITRARY

ARBITRARY(x) Returns an arbitrary non-null element from the input. Returns null if the input is empty or all null.

This function returns an arbitrary element. It will return some non-null element from the input. It only returns null if there are no inputs or if all inputs are null.

WITH example AS (
      SELECT ['red'] AS color
      UNION ALL SELECT ['blue']
      UNION ALL SELECT ['yellow']
      UNION ALL SELECT ['green']
      UNION ALL SELECT ['white']
      UNION ALL SELECT ['black']
  )
  SELECT
      arbitrary(t.color) AS arbitrary_color
  FROM
      example t
["red"]

COUNT

COUNT(*) Returns the number of input rows.

WITH examples AS (
      SELECT 'John' AS name
      UNION ALL SELECT 'Jane'
      UNION ALL SELECT 'Angela'
      UNION ALL SELECT null
      UNION ALL SELECT 'James'
  )
  SELECT COUNT(*)
  FROM examples
5

COUNT(x) Returns the number of rows with non-null values for field x.

WITH examples AS (
      SELECT 'John' AS name
      UNION ALL SELECT 'Jane'
      UNION ALL SELECT 'Angela'
      UNION ALL SELECT null
      UNION ALL SELECT 'James'
  )
  SELECT COUNT(name)
  FROM examples
4
WITH examples AS (
      SELECT 'John' AS name
      UNION ALL SELECT 'Jane'
      UNION ALL SELECT 'Angela'
      UNION ALL SELECT 'Aaron'
      UNION ALL SELECT 'james'
  )
  SELECT UPPER(SUBSTR(name, 1, 1)) first_letter, COUNT(*) count
  FROM examples
  GROUP BY first_letter
+--------------+-----------+ | first_letter | count | |--------------+-----------| | J | 3 | | A | 2 | +--------------+-----------+

APPROX_DISTINCT

APPROX_DISTINCT(x[, e]) Returns the approximate number of distinct elements with a non-null value for field x.

This function calcuates an approximate count of the number of distinct values. It is similar to COUNT(DISTINCT x), but it consumes a bounded amount of memory while DISTINCT consumes memory linear in the number of distinct items being counted.

Note: The result it yields is not guaranteed to be exact. If the parameter e is specified,
the standard error of the estimate will be no more than e. If it is not specified, e will
default to 1%. e must be in the range [0.0011, 0.26]. APPROX_DISTINCT is implemented with the
improved estimator for HyperLogLog proposed by Otmar Ertl.

WITH examples AS (
    SELECT 'John' AS name
    UNION ALL SELECT 'John'
    UNION ALL SELECT 'Angela'
    UNION ALL SELECT null
    UNION ALL SELECT 'Angela'
)
SELECT APPROX_DISTINCT(name)
FROM examples
2

GROUPING

GROUPING(col1, col1, ... col_n) Returns the grouping mask, which is a bitmask associating one bit with every column (the first column in the list of arguments corresponds to the most significant bit in the result).

A column's bit is 0 if the column is included in the current grouping and 1 if it is not included.

Most useful with GROUPING SETS, ROLLUP, or CUBE.

/*
Using GROUPING_SETS((), (x), (y), (x,y)) leads to 4 aggregations,
one by x, one by y, one by the pair (x, y) and one by () which means
all rows. The groupId will identify which of these aggregations the
sum belongs to. The groupIds are:
  3– in binary '11' -> x=1, y=1 -> grouped by ()
  2- in binary '10' -> x=1, y=0 -> grouped by y
  1- in binary '01' -> x=0, y=1 -> grouped by x
  0- in binary '00' -> x=0, y=0 -> grouped by (x, y)
*/
WITH examples AS (
      SELECT 1 AS x, 'a' AS y
      UNION ALL SELECT 2, 'b'
      UNION ALL SELECT 1, 'c'
      UNION ALL SELECT 5, 'a'
  )

  SELECT GROUPING(x, y) group_id, SUM(x) sum
  FROM examples
  GROUP BY GROUPING SETS((), (x), (y), (x,y))
  ORDER BY group_id DESC
+-----------+-------+ | group_id | sum | |-----------+-------| | 3 | 9 | | 2 | 1 | | 2 | 6 | | 2 | 2 | | 1 | 2 | | 1 | 2 | | 1 | 5 | | 0 | 2 | | 0 | 1 | | 0 | 5 | | 0 | 1 | +-----------+-------+
/*
This is equivalent to the example above but
uses the command CUBE instead of GROUPING SETS
*/
WITH examples AS (
      SELECT 1 AS x, 'a' AS y
      UNION ALL SELECT 2, 'b'
      UNION ALL SELECT 1, 'c'
      UNION ALL SELECT 5, 'a'
  )

  SELECT GROUPING(x, y) group_id, SUM(x) sum
  FROM examples
  GROUP BY CUBE(x, y)
  ORDER BY group_id DESC
+-----------+-------+ | group_id | sum | |-----------+-------| | 3 | 9 | | 2 | 1 | | 2 | 6 | | 2 | 2 | | 1 | 2 | | 1 | 2 | | 1 | 5 | | 0 | 2 | | 0 | 1 | | 0 | 5 | | 0 | 1 | +-----------+-------+
/*
Almost identical to the previous two examples, but
there is no group (y) since ROLLUP only groups by
subsequences of the provided columns, so in this case
(x, y), (x), and ().
*/
WITH examples AS (
      SELECT 1 AS x, 'a' AS y
      UNION ALL SELECT 2, 'b'
      UNION ALL SELECT 1, 'c'
      UNION ALL SELECT 5, 'a'
  )

  SELECT GROUPING(x, y) group_id, SUM(x) sum
  FROM examples
  GROUP BY ROLLUP(x, y)
  ORDER BY group_id DESC
+-----------+-------+ | group_id | sum | |-----------+-------| | 3 | 9 | | 1 | 2 | | 1 | 2 | | 1 | 5 | | 0 | 2 | | 0 | 1 | | 0 | 5 | | 0 | 1 | +-----------+-------+