This page covers the type system in Rockset.
Every value stored in Rockset is strongly typed, having one of the types below. A Rockset document itself has type object
.
int
float
bool
string
bytes
null
u256
array
object
date
datetime
time
timestamp
month_interval
microsecond_interval
geography
These are the type names used in type functions and in the result of the DESCRIBE
command. More information regarding each type is given below.
JSON Formatting
Check out our documentation on Rockset Extended JSON for information on how Rockset parses data types in JSON.
Basic Types
Integer
int
Integers are numeric values that do not have fractional components. In Rockset, integers must be in the range between -2^63
and 2^63-1
, inclusive.
Aliases are integer
and int64
.
SELECT CAST()
Float
float
Floats are numeric values with fractional components. In Rockset, floats have 64-bit precision.
Aliases are double
and double precision
.
Boolean
bool
A boolean is either true
or false
.
Alias is boolean
.
String
string
A string represents character data of any length.
In SQL statements, string literals should be enclosed with single quotes (such as 'hello'
), not double quotes, which are interpreted as identifiers like field or collection names.
You can escape a single quote in a string literal by doubling up the single quote (ie. SELECT 'it''s'
will return the string literal 'it's'
).
Aliases are char
, varchar
, and text
.
Bytes
bytes
Values of type bytes
contain arbitrary byte data of any length.
Aliases are binary
and varbinary
.
Null
null
We distinguish between two different forms of nulls: null
and undefined
. undefined
is returned when selecting a collection field that does not exist, dereferencing an object field that does not exist, or when subscripting an array with an out-of-range index. null
is returned when selecting a field whose value has been explicitly set to null
.
Both null
and undefined
behave exactly the same in almost all cases, with the following exceptions:
- Functions such as
typeof
andjson_format
. - A predicate
IS UNDEFINED
returnstrue
forundefined
andfalse
fornull
.IS NULL
returns true for both. - A distinction between
undefined
andnull
is preserved in the result set and in the JSON
serialization. - Objects where fields have a
null
value are not equal to objects with a missing fields, i.e.
{"a": 1, "b": null} != {"a": 1}
.
Most functions will return null
when one of the arguments is null
or undefined
. Most aggregate functions will ignore both null
and undefined
inputs.
To emphasize the difference between null
and undefined
, suppose we have a collection with the following documents:
{"name": {"first": "John", "last": "HopCroft"}},
{"name": {"first": "Robert", "last": "Tarjan"}},
{"name": {"first": "Alan", "middle": "Curtis", "last": "Kay"}},
{"name": {"first": "Edsger", "middle": null, "last": "Dijkstra"}}
In that case, the field name.middle
is a null
in the last document as opposed to undefined
in the first 2 documents that it is absent in.
u256
u256
u256s are unsigned integers with 256 bits. In Rockset, u256s must be in the range between 0
and 2^256 - 1
, inclusive.
u256 supports the following operations and functions:
- Arithmetic operations
+
,-
,*
,/
, and%
.- u256 does not support arithmetic operations with other numeric types.
- Comparison operations
=
,<>
,!=
,<
,>=
,>
, and>=
.- Comparing other numeric types with u256 will return false for equality and produce an error for inequality. The other comparison operations have certain behaviors outlined in the next section.
- Bitwise operations
AND
,OR
,NOT
, andXOR
. - Aggregation functions
AVG
andSUM
.- If the input mixes u256 and other numeric types in a single grouping,
AVG
andSUM
will throw an exception.
- If the input mixes u256 and other numeric types in a single grouping,
- Aggregation functions
MIN
,MAX
,MIN_BY
,MAX_BY
, andCOUNT
.- If the values are entirely int/float or entirely u256, these functions compare the values arithmetically.
- If the values are a mixture of int/float and u256, u256 values are considered greater than all int and float values for
MAX
andMAX_BY
and less than forMIN
andMIN_BY
.
U256_FROM_BASE(string, base)
creates u256 values from any base strings between 2 and 36, inclusive. We also support casting from base 10 strings.PACK_U256(int, int, int, int)
creates u256 values from 4 integers.
u256 has certain behaviors depending on the commands and operations:
GROUP BY
andJOIN
s will consider u256s different from ints or floats, even if they are arithmetically the same.ORDER BY
orders all u256s after all ints and floats, while ints and floats are interleaved.- Casting between
u256
andint
,float
, andstring
.int
: Value must be in the range of the target type. Otherwise, casting will throw a domain error.float
: Value must be in the range of the target type. Otherwise, casting will throw a domain error. Casting can cause some potential loss of precision in either direction.string
: Casting assumes base 10 encoding.
select * from foo where x = U256_FROM_BASE('5', 10)
yields documents wherex
is a u256 with a value of 5. If the column contains ints or floats, the query will not return those documents.select * from foo where x = 5
yields documents wherex
is an int or float with a value of 5. If the column contains u256s, the query will not return those documents.select * from foo where x > U256_FROM_BASE('5', 10)
yields documents wherex
is a u256 with a value greater than 5. If the column contains ints or floats, the query will not return those documents or throw an invalid comparison error.select * from foo where x > 5
yields documents wherex
is an int or float with a value greater than 5. If the column contains u256s, the query will not return those documents or throw an invalid comparison error.
Nested Types
Rockset supports arrays and objects, which can contain any value and hence be nested recursively. The number of levels of a nesting of a field in Rockset is capped at 32. Any given level may have cardinality (keys in object or elements in array) of up to 2^32.
Array
array
An array represents an ordered list of zero or more values, each of any of Rockset type. For example, []
, [1, 2, 3]
, and [false, 42, ['hello', 'world']]
are all valid arrays.
Arrays can be created inline with ARRAY [1,2,3]
or simply [1,2,3]
.
Values inside an array foo
can be accessed in SQL statements using the square bracket notation shown below.
-- foo = [false, 42, ['hello', 'world']]
foo[2] -- has value 42
foo[3][1] -- has value 'hello'
In Rockset, array indexing starts at 1.
Object
object
An object represents a dictionary whose keys are strings and values are of any Rockset type. For example, {}
, {'a': 10, 'b': 20}
, and {'a': 10, 'b': {'c': true}}
are all valid objects.
Values inside an object can be accessed in SQL statements using the dot notation shown below.
-- foo = {'a': 10, 'b': {'c': true}}
foo.a -- has value 10
foo.b -- has value {'c': true}
foo.b.c -- has value true
foo."b".c -- also works, field names can be individually escaped with double quotes
Alias is map
.
Date and Time Types
Refer also to the page on date and time functions for functions to construct and manipulate date and time values.
When data is ingested into or returned from Rockset in JSON form, these types can be specified in a special format shown below.
{
"_id": "foo"
"_event_time": {"__rockset_type": "timestamp", "value": "42"}
}
Note that field _event_time
is parsed not as object
but rather timestamp
with value 42
.
Date
date
A date value represents a logical calendar date (year, month, day) independent of time zone. A date does not represent a specific time period; it can differ based on timezones. To represent an absolute point in time, use a timestamp instead.
A date literal in SQL syntax is formatted as follows.
DATE 'YYYY-[M]M-[D]D'
This consists of:
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day
DATE '2018-01-01' -- example literal
DATE(2018, 1, 1) -- constructor function
Datetime
datetime
A datetime value represents a point in time (year, month, day, hour, minute, second, microsecond). It does not refer to an absolute instance in time, unlike timestamp. Instead, it is the civil time; the time that a user would see on a watch or calendar.
A date literal in SQL syntax is formatted as follows.
DATETIME 'YYYY-[M]M-[D]D[( )[H]H:[M]M:[S]S[.DDDDDD]]'
This consists of:
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day( )
: A space separator[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)[.DDDDDD]
: Up to six fractional digits
DATETIME '2018-01-01 9:30:45.456' -- example literal
DATETIME(2018, 1, 1, 9, 30, 45, 456) -- constructor function
Time
time
A time value represents the time of the day (hour, minute, second, millisecond) independent of a specific date.
A time literal in SQL syntax is formatted as follows.
TIME '[H]H:[M]M:[S]S[.DDDDDD]'
This consists of:
[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)[.DDDDDD
: Up to six fractional digits
TIME '09:30:45.456' -- example literal
TIME(9, 30, 45, 456) -- constructor function
Timestamp
timestamp
A timestamp value represents absolute date and time values independent of any time zone.
A timestamp literal in SQL syntax is formatted as follows.
TIMESTAMP 'YYYY-[M]M-[D]D[( |T)[H]H:[M]M[:[S]S[.DDDDDD]]][time zone]'
This consists of:
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day( )
: A space separator[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)[.DDDDDD]
: Up to six fractional digits[time zone]
: Offset from Coordinated Universal Time (UTC). When a time zone is not explicitly
specified, the default time zone, UTC, is used. The offset is formatted as(+|-)H[H][:M[M]]
,Z
(synonym for UTC), orUTC
. When using this format, there is a space between the time zone and the rest of the timestamp only for theUTC
suffix.
TIMESTAMP '2018-01-01 09:30:45.456-05:00' -- example literal
Month Interval
month_interval
A month interval refers to a specific number of months.
As months have different lengths, month intervals may only be added to or subtracted from date or datetime values.
Examples of month intervals are shown below.
INTERVAL 3 MONTH;
INTERVAL 2 YEAR;
INTERVAL '2-3' YEAR TO MONTH;
Microsecond Interval
microsecond_interval
A microsecond interval refers to a fixed amount of time with microsecond precision.
Microsecond intervals may be added to or subtracted from dates, times, datetimes, and timestamps. Also, you get a microsecond interval when you subtract two dates, times, datetimes, or timestamps (indicating the length of time between the two time points).
Examples of microsecond intervals are shown below.
INTERVAL 2 DAY;
INTERVAL 3 HOUR;
INTERVAL 5 MINUTE;
INTERVAL 10 SECOND;
INTERVAL '2 10:23:45.56' DAY TO SECOND;
Geography
geography
A geography refers to a shape on the surface of a sphere (Earth). It can be either a point (latitude and longitude), a linestring (an ordered list of connected points), or a polygon (a set of loops which enclose some interior space). Rockset internally uses Google's S2 Geometry library to manipulate geographies. A few aspects of using geographies can be counterintuitive:
- Polygons consist of one or more outer shells, which may contain holes. For more details, read
about GeoJSON. - Polygons contain whichever side of their boundary which has less area. For instance, an arc making
a circle around the earth at 1 degree north latitude would contain most of the northern hemisphere, and an arc making a circle around the earth at 1 degree south latitude would contain most of the southern hemisphere. It is not possible to construct a polygon which contains more than half the earth. - Polygons must follow all the
restrictions described by S2. They may not intersect themselves, contain duplicate points, or be otherwise degenerate. - All segments connecting points are geodesics. A geodesic is the shortest path over the surface of
a sphere, which is not a straight line in latitude/longitude space. - The earth is modeled as a sphere, not an ellipsoid. Because of this, results may be distorted by
up to 0.56%. - Geographies are not stored internally as latitude/longitude pairs, but rather S2's internal 3
dimensional representation. Therefore importing and exporting geographies may lead to small rounding errors due to floating point precision. - Do not construct geographies vulnerable to numerical precision issues. For example, edges with
nearly antipodal endpoints may form a geodesic in a different direction than you expect.
This is how you would create a point at 34N 12E:
SELECT ST_GEOGPOINT(12, 34)
Here is another way to construct the same point from a string in well known text format.
SELECT ST_GEOGFROMTEXT('POINT(12 34)')
This example creates a linestring from 34N 12E to 78N 56E to 42N 42E:
SELECT ST_GEOGFROMTEXT('LINESTRING(12 34, 56 78, 42 42)')
Here is an example of a roughly square polygon around the west coast of Africa:
SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 10 0, 10 10, 0 10))')
You can import geography types into Rockset by applying either ST_GEOGPOINT
or ST_GEOGFROMTEXT
in an ingest transformation. Refer also to the geographic functions page for functions to construct and manipulate geographic values.
Depending upon the data source, your geographic data may not be validated by default.
Syntactically correct, but semantically invalid geographic values may be inserted into your collections.
Ingest transformations provide a mechanism to control the behavior of semantically invalid geographic data written to your collections.