DESCRIBE collection_name
DESCRIBE
The output of DESCRIBE has the following fields: 1. field: Every distinct field name in the collection. 2. type: The data type of the field. 3. occurrences: The number of documents that have this field in the given type. 4. total: Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields.
Note: We use the delimiter
*
(glob) in the field name to indicate descending into an array.
It can be replaced by an integer index to construct an instance of the field path. If multiple
data types appear for a certain field path,DESCRIBE
will return one entry for each type.
Suppose a collection names
has the following documents:
{
"name": "John",
"age": 31,
"city": "New York"
},
{
"name": "Michael",
"age": "notfound",
"experiences": [
{
"title": "XYZ",
"years": 4
}
]
}
DESCRIBE names
Let's look at how DESCRIBE works with collections that have documents with each of the following properties:
- Mixed Types
- Nested Objects
- Sparse Fields and Null Values
- Nested Arrays
Mixed Types
Consider the collection zipcodes
with the following documents:
{"zipcode": 94542},
{"zipcode": "91126"},
{"zipcode": 94110.0},
{"zipcode": "94020"}
Every document in this collection has the same field called zipcode, but its type is different across the collection. This is captured below.
DESCRIBE zipcodes
We can use the typeof
SQL function to get all zipcodes of type string
:
SELECT
zipcode
FROM
zipcodes
WHERE
TYPEOF(zipcode) = 'string'
Then to cast field zipcode
as string as we retrieve it, we can do the following:
SELECT
CAST(zipcode AS string) zipcode
FROM
zipcodes
Nested Objects
Documents can have objects with scalars, nested arrays, or nested objects. Consider a simple collection of documents with names of Turing Award winners, which we will call turing_award_winners
.
{"name": {"first": "John", "last": "HopCroft"}},
{"name": {"first": "Robert", "last": "Tarjan"}},
{"name": {"first": "Alan", "last": "Kay"}},
{"name": {"first":"Edsger", "last": "Dijkstra"}}
Each document has an object with a field called name
that has nested fields first
and last
. To access nested fields inside objects, we concatenate the field names with a . (dot) as separator. For example, use name.first
and name.last
to access the first and last names, respectively, in each of these documents. The fields name.first
and name.last
are present in all documents as scalars of type 'string'. Hence, the occurrences and total for each field in this document will be the same as the total number of documents.
DESCRIBE turing_award_winners
Say we want to list all the first and last names from this collection.
SELECT
turing_award_winners.name.first,
turing_award_winners.name.last
FROM
turing_award_winners
Sparse Fields and Null Values
Suppose collection turing_award_winners
now has the following documents.
{"name": {"first": "John", "last": "HopCroft"}},
{"name": {"first": "Robert", "last": "Tarjan"}},
{"name": {"first": "Alan", "middle": "Curtis", "last": "Kay"}},
{"name": {"first": "Edsger", "last": "Dijkstra"}}
Notice that this is similar to the previous collection with an additional nested field middle added to the third document. The field name.middle
is a sparse field which is a string in 1 document. It is treated as undefined
in the other 3 documents that it is absent in. Note that this is different from null
. This is clarified in further detail in the data-types page. DESCRIBE
helps capture such sparse fields as well.
DESCRIBE turing_award_winners
While fields with type undefined
are not captured in the smart schema, those with type null
are. Say the last document in the collection was {"name": {"first": "Edsger", "middle": null, "last": "Dijkstra"}}
instead.
DESCRIBE turing_award_winners
Both null
and undefined
types behave the same way in almost all situations. The predicate IS NULL
will return true
for both null
and undefined
. You can filter the fields that are undefined
with a special predicate IS UNDEFINED
.
SELECT
COUNT(*)
FROM
turing_award_winners
WHERE
turing_award_winners.name.middle IS NOT NULL
SELECT
turing_award_winners.name.middle
FROM
turing_award_winners
WHERE
turing_award_winners.name.middle IS NOT NULL
If you want to also include the fields with a null
value, you can use the IS NOT UNDEFINED
predicate:
SELECT
turing_award_winners.name.middle
FROM
turing_award_winners
WHERE
turing_award_winners.name.middle IS NOT UNDEFINED