Data Types

This page covers the type system in Rockset.

Every value stored in Rockset is strongly typed, having one of the types below. A Rockset document itself has type object.

  • int
  • float
  • bool
  • string
  • bytes
  • null
  • array
  • object
  • date
  • datetime
  • time
  • timestamp
  • month_interval
  • microsecond_interval
  • geography

These are the type names used in type functions and in the result of the DESCRIBE command. More information regarding each type is given below.

#Basic Types

#Integer

int

Integers are numeric values that do not have fractional components. In Rockset, integers must be in the range between -2^63 and 2^63-1, inclusive.

Aliases are integer and int64.

#Float

float

Floats are numeric values with fractional components. In Rockset, floats have 64-bit precision.

Aliases are double and double precision.

#Boolean

bool

A boolean is either true or false.

Alias is boolean.

#String

string

A string represents character data of any length.

In SQL statements, string literals should be enclosed with single quotes (e.g. 'hello'), not double quotes (which are interpreted as identifiers like field or collection names).

Aliases are char, varchar, and text.

#Bytes

bytes

Values of type bytes contain arbitrary byte data of any length.

Aliases are binary and varbinary.

#Null

null

We distinguish between two different forms of nulls: null and undefined. undefined is returned when selecting a collection field that doesn't exist, dereferencing an object field that doesn't exist, or when subscripting an array with an out-of-range index. null is returned when selecting a field whose value has been explicitly set to null.

null and undefined behave exactly the same in almost all cases. There are a couple of exceptions:

  • Functions such as typeof and json_format.
  • A predicate IS UNDEFINED returns true for undefined and false for null. IS NULL returns true for both.
  • A distinction between undefined and null is preserved in the result set and in the JSON serialization.
  • Objects where fields have a null value are not equal to objects with a missing fields, i.e. {"a": 1, "b": null} != {"a": 1}.

Most functions will return null when one of the arguments is null or undefined. Most aggregate functions will ignore both null and undefined inputs.

To emphasize the difference between null and undefined let us consider the following example.

Suppose we have a collection with the following documents:

{"name": {"first": "John", "last": "HopCroft"}},
{"name": {"first": "Robert", "last": "Tarjan"}},
{"name": {"first": "Alan", "middle": "Curtis", "last": "Kay"}},
{"name": {"first": "Edsger", "middle": null, "last": "Dijkstra"}}

In that case, the field name.middle is a null in the last document as opposed to undefined in the first 2 documents that it is absent in.

#Nested Types

Rockset supports arrays and objects, which can contain any value and hence be nested recursively. The number of levels of a nesting of a field in Rockset is capped at 100. Any given level may have cardinality (keys in object or elements in array) of up to 2^32.

#Array

array

An array represents an ordered list of zero or more values, each of any of Rockset type. For example, [], [1, 2, 3], and [false, 42, ['hello', 'world']] are all valid arrays.

Arrays can be created inline with ARRAY [1,2,3] or simply [1,2,3].

Values inside an array foo can be accessed in SQL statements using the square bracket notation shown below.

-- foo = [false, 42, ['hello', 'world']]
foo[2]     -- has value 42
foo[3][1]  -- has value 'hello'

In Rockset, array indexing starts at 1.

#Object

object

An object represents a dictionary whose keys are strings and values are of any Rockset type. For example, {}, {'a': 10, 'b': 20}, and {'a': 10, 'b': {'c': true}} are all valid objects.

Values inside an object can be accessed in SQL statements using the dot notation shown below.

-- foo = {'a': 10, 'b': {'c': true}}
foo.a      -- has value 10
foo.b      -- has value {'c': true}
foo.b.c    -- has value true
foo."b".c  -- also works, field names can be individually escaped with double quotes

Alias is map.

#Date and Time Types

Refer also to the page on date and time functions for functions to construct and manipulate date and time values.

When data ingested into or returned from Rockset in JSON form, these types can be specified in a special format shown below.

{
  "_id": "foo"
  "_event_time": {"__rockset_type": "timestamp", "value": "42"}
}

Note that field _event_time is parsed not as object but rather timestamp with value 42.

#Date

date

A date value represents a logical calendar date (year, month, day) independent of time zone. A date does not represent a specific time period; it can differ based on timezones. To represent an absolute point in time, use a timestamp instead.

A date literal in SQL syntax is formatted as follows.

DATE 'YYYY-[M]M-[D]D'

This consists of:

  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
DATE '2018-01-01' -- example literal
DATE(2018, 1, 1) -- constructor function

#Datetime

datetime

A datetime value represents a point in time (year, month, day, hour, minute, second, microsecond). It does not refer to an absolute instance in time, unlike timestamp. Instead, it is the civil time; the time that a user would see on a watch or calendar.

A date literal in SQL syntax is formatted as follows.

DATETIME 'YYYY-[M]M-[D]D[( )[H]H:[M]M:[S]S[.DDDDDD]]'

This consists of:

  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( ): A space separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits
DATETIME '2018-01-01 9:30:45.456' -- example literal
DATETIME(2018, 1, 1, 9, 30, 45, 456) -- constructor function

#Time

time

A time value represents the time of the day (hour, minute, second, millisecond) independent of a specific date.

A time literal in SQL syntax is formatted as follows.

TIME '[H]H:[M]M:[S]S[.DDDDDD]'

This consists of:

  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD: Up to six fractional digits
TIME '09:30:45.456' -- example literal
TIME(9, 30, 45, 456) -- constructor function

#Timestamp

timestamp

A timestamp value represents absolute date and time values independent of any time zone.

A timestamp literal in SQL syntax is formatted as follows.

TIMESTAMP 'YYYY-[M]M-[D]D[( |T)[H]H:[M]M[:[S]S[.DDDDDD]]][time zone]'

This consists of:

  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( ): A space separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits
  • [time zone]: Offset from Coordinated Universal Time (UTC). When a time zone is not explicitly specified, the default time zone, UTC, is used. The offset is formatted as (+|-)H[H][:M[M]], Z (synonym for UTC), or UTC. When using this format, there is a space between the time zone and the rest of the timestamp only for the UTC suffix.
TIMESTAMP '2018-01-01 09:30:45.456-05:00' -- example literal

#Month Interval

month_interval

A month interval refers to a specific number of months.

As months have different lengths, month intervals may only be added to or subtracted from date or datetime values.

Examples of month intervals are shown below.

INTERVAL 3 MONTH INTERVAL 2 YEAR INTERVAL '2-3' YEAR TO MONTH

#Microsecond Interval

microsecond_interval

A microsecond interval refers to a fixed amount of time with microsecond precision.

Microsecond intervals may be added to or subtracted from dates, times, datetimes, and timestamps. Also, you get a microsecond interval when you subtract two dates, times, datetimes, or timestamps (indicating the length of time between the two time points).

Examples of microsecond intervals are shown below.

INTERVAL 2 DAY INTERVAL 3 HOUR INTERVAL 5 MINUTE INTERVAL 10 SECOND INTERVAL 10.123 SECOND -- fractional seconds allowed
INTERVAL '2 10:23:45.56' DAY TO SECOND INTERVAL '10:23' DAY TO MINUTE -- (etc)

#Geography

geography

A geography refers to a shape on the surface of a sphere (Earth). It can be either a point (latitude and longitude), a linestring (an ordered list of connected points), or a polygon (a set of loops which enclose some interior space). Rockset internally uses Google's S2 Geometry library to manipulate geographies. It is highly recommended you read their documentation. A few aspects of using geographies can be counterintuitive:

  • Polygons consist of one or more outer shells, which may contain holes. For more details, read about GeoJSON.
  • Polygons contain whichever side of their boundary which has less area. For instance, an arc making a circle around the earth at 1 degree north latitude would contain most of the northern hemisphere, and an arc making a circle around the earth at 1 degree south latitude would contain most of the southern hemisphere. It is not possible to construct a polygon which contains more than half the earth.
  • Polygons must follow all the restrictions described by S2. They may not intersect themselves, contain duplicate points, or be otherwise degenerate.
  • All segments connecting points are geodesics. A geodesic is the shortest path over the surface of a sphere, which is not a straight line in latitude/longitude space.
  • The earth is modeled as a sphere, not an ellipsoid. Because of this, results may be distorted by up to 0.56%.
  • Geographies are not stored internally as latitude/longitude pairs, but rather S2's internal 3 dimensional representation. Therefore importing and exporting geographies may lead to small rounding errors due to floating point precision.
  • Do not construct geographies vulnerable to numerical precision issues. For example, edges with nearly antipodal endpoints may form a geodesic in a different direction than you expect.

This is how you would create a point at 34N 12E:

SELECT
    ST_GEOGPOINT(12, 34)
{"__rockset_type":"GEOGRAPHY","value":{"type":"Point","coordinates":[34.00000000000001,12]}}

Here's another way to construct the same point from a string in well known text format.

SELECT
    ST_GEOGFROMTEXT('POINT(12 34)')
{"__rockset_type":"GEOGRAPHY","value":{"type":"Point","coordinates":[34.00000000000001,12]}}

This example creates a linestring from 34N 12E to 78N 56E to 42N 42E:

SELECT
    ST_GEOGFROMTEXT('LINESTRING(12 34, 56 78, 42 42)')
{"__rockset_type":"GEOGRAPHY","value":{"type":"LineString","coordinates":[[34.00000000000001,12],[78,56],[42,42]]}}

Here is an example of a roughly square polygon around the west coast of Africa:

SELECT
    ST_GEOGFROMTEXT('POLYGON((0 0, 10 0, 10 10, 0 10))')
{"__rockset_type":"GEOGRAPHY","value":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10.000000000000004,10],[10,0]]]}}

You can import geography types into Rockset by applying either ST_GEOGPOINT or ST_GEOGFROMTEXT in a field mapping. Refer also to the page on geographic functions for functions to construct and manipulate geographic values.

Join us on Slack!
Building on Rockset? Come chat with us!