DESCRIBE

DESCRIBE collection_name

DESCRIBE The output of DESCRIBE has the following fields: 1. field: Every distinct field name in the collection. 2. type: The data type of the field. 3. occurrences: The number of documents that have this field in the given type. 4. total: Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields.

Note: We use the delimiter * (glob) in the field name to indicate descending into an array.
It can be replaced by an integer index to construct an instance of the field path. If multiple
data types appear for a certain field path, DESCRIBE will return one entry for each type.

Suppose a collection names has the following documents:

{
    "name": "John",
    "age": 31,
    "city": "New York"
},
{
    "name": "Michael",
    "age": "notfound",
    "experiences": [
        {
            "title": "XYZ",
            "years": 4
        }
    ]
}
DESCRIBE names
+--------------------------------------+---------------+---------+-----------+ | field | occurrences | total | type | |--------------------------------------+---------------+---------+-----------| | ['name'] | 2 | 2 | string | | ['age'] | 1 | 2 | string | | ['age'] | 1 | 2 | int | | ['city'] | 1 | 2 | string | | ['experiences'] | 4 | 4 | string | | ['experiences', '*'] | 1 | 1 | object | | ['experiences', '*', 'title'] | 1 | 1 | string | | ['experiences', '*', 'years'] | 1 | 1 | int | +--------------------------------------+---------------+---------+-----------+

Let's look at how DESCRIBE works with collections that have documents with each of the following properties:

  • Mixed Types
  • Nested Objects
  • Sparse Fields and Null Values
  • Nested Arrays

Mixed Types

Consider the collection zipcodes with the following documents:

{"zipcode": 94542},
{"zipcode": "91126"},
{"zipcode": 94110.0},
{"zipcode": "94020"}

Every document in this collection has the same field called zipcode, but its type is different across the collection. This is captured below.

DESCRIBE zipcodes
+-------------+-------------+-------+--------+ | field | occurrences | total | type | |-------------+-------------+-------+--------| | ['zipcode'] | 1 | 4 | int | | ['zipcode'] | 2 | 4 | string | | ['zipcode'] | 1 | 4 | float | +-------------+-------------+-------+--------+

We can use the typeof SQL function to get all zipcodes of type string:

SELECT
    zipcode
FROM
    zipcodes
WHERE
    TYPEOF(zipcode) = 'string'

+-----------+ | zipcode | |-----------| | 94020 | | 91126 | +-----------+

Then to cast field zipcode as string as we retrieve it, we can do the following:

SELECT
    CAST(zipcode AS string) zipcode
FROM
    zipcodes

+-----------+ | zipcode | |-----------| | 94020 | | 94110 | | 94542 | | 91126 | +-----------+

Nested Objects

Documents can have objects with scalars, nested arrays, or nested objects. Consider a simple collection of documents with names of Turing Award winners, which we will call turing_award_winners.

{"name": {"first": "John", "last": "HopCroft"}},
{"name": {"first": "Robert", "last": "Tarjan"}},
{"name": {"first": "Alan", "last": "Kay"}},
{"name": {"first":"Edsger", "last": "Dijkstra"}}

Each document has an object with a field called name that has nested fields first and last. To access nested fields inside objects, we concatenate the field names with a . (dot) as separator. For example, use name.first and name.last to access the first and last names, respectively, in each of these documents. The fields name.first and name.last are present in all documents as scalars of type 'string'. Hence, the occurrences and total for each field in this document will be the same as the total number of documents.

DESCRIBE turing_award_winners
+--------------------------------------+---------------+---------+-----------+ | field | occurrences | total | type | |--------------------------------------+---------------+---------+-----------| | ['name'] | 4 | 4 | object | | ['name', 'first'] | 4 | 4 | string | | ['name', 'last'] | 4 | 4 | string | +--------------------------------------+---------------+---------+-----------+

Say we want to list all the first and last names from this collection.

SELECT
    turing_award_winners.name.first,
    turing_award_winners.name.last
FROM
    turing_award_winners

+---------+----------+ | first | last | |---------+----------| | Alan | Kay | | John | HopCroft | | Robert | Tarjan | | Edsger | Dijkstra | +---------+----------+

Sparse Fields and Null Values

Suppose collection turing_award_winners now has the following documents.

{"name": {"first": "John", "last": "HopCroft"}},
{"name": {"first": "Robert", "last": "Tarjan"}},
{"name": {"first": "Alan", "middle": "Curtis", "last": "Kay"}},
{"name": {"first": "Edsger", "last": "Dijkstra"}}

Notice that this is similar to the previous collection with an additional nested field middle added to the third document. The field name.middle is a sparse field which is a string in 1 document. It is treated as undefined in the other 3 documents that it is absent in. Note that this is different from null. This is clarified in further detail in the data-types page. DESCRIBE helps capture such sparse fields as well.

DESCRIBE turing_award_winners
+--------------------------------------+---------------+---------+-----------+ | field | occurrences | total | type | |--------------------------------------+---------------+---------+-----------| | ['name'] | 4 | 4 | object | | ['name', 'first'] | 4 | 4 | string | | ['name', 'last'] | 4 | 4 | string | | ['name', 'middle'] | 1 | 4 | string | +--------------------------------------+---------------+---------+-----------+

While fields with type undefined are not captured in the smart schema, those with type null are. Say the last document in the collection was {"name": {"first": "Edsger", "middle": null, "last": "Dijkstra"}} instead.

DESCRIBE turing_award_winners
+-------------------+----------------+---------+-----------+ | field | occurrences | total | type | |------------------------------------+---------------------+ | ['name'] | 4 | 4 | object | | ['name', 'first'] | 4 | 4 | string | | ['name', 'last'] | 4 | 4 | string | | ['name', 'middle'] | 1 | 4 | null | | ['name', 'middle'] | 1 | 4 | string | +----------------------------------------------+-----------+

Both null and undefined types behave the same way in almost all situations. The predicate IS NULL will return true for both null and undefined. You can filter the fields that are undefined with a special predicate IS UNDEFINED.

SELECT
    COUNT(*)
FROM
    turing_award_winners
WHERE
    turing_award_winners.name.middle IS NOT NULL

+----------+ | ?count | |----------| | 1 | +----------+
SELECT
    turing_award_winners.name.middle
FROM
    turing_award_winners
WHERE
    turing_award_winners.name.middle IS NOT NULL

+----------------------+ | middle | |----------------------| | Curtis | +----------------------+

If you want to also include the fields with a null value, you can use the IS NOT UNDEFINED predicate:

SELECT
     turing_award_winners.name.middle
 FROM
     turing_award_winners
 WHERE
     turing_award_winners.name.middle IS NOT UNDEFINED

+----------------------+ | middle | |----------------------| | Curtis | | null | +----------------------+