Home > FAQ

FAQ

This page contains answers to common questions about Rockset. If your question is not answered here, please reach out to Rockset’s support team at support@rockset.com and we will be happy to help.

How do I get a Rockset account?

Please contact us at hello@rockset.com if you are interested in a Rockset account.

What is strong dynamic typing?

Rockset has strong dynamic typing, where the data type is associated with the value of the field in every column, rather than entire columns. Rockset does not do any implicit type conversion at query time or write time. The original type information of every field is preserved along with every field value and are respected at query processing time.

If the same field contains different types, then Rockset’s smart schema exposes it for you and lets you execute strongly typed queries on this dynamically typed data. Consider the following example:

$ rock create collection people
$ echo '{"age": "a lady never tells"}' | rock upload people -
$ echo '{"age": 42}' | rock upload people -
$ echo '{"age": 3.142}' | rock upload people -
$ rock sql 'DESCRIBE people'
+---------+--------+
| path    | type   |
|---------+--------|
| ['age'] | int    |
| ['age'] | float  |
| ['age'] | string |
+---------+--------+

Now, a query SELECT * FROM people WHERE age >= 18 looks for all documents in which the following three conditions are true:

  • field age is defined in the document
  • field age has an integer or float value
  • field age's numeric value is greater than or equal to 18

If some documents in the collection, due to errors or bugs, does not contain field age or the field age has a string or null value, all such documents will not be returned for the query.

Note that in this case, the query considers both integer and float types because the query literal is numeric. In all other cases, the query considers only the type matching that of the query literal. For example, the query SELECT * FROM people WHERE "last_name" = 'Gray' matches the type string and value 'Gray'.

How does a null field in a JSON document get parsed?

In Rockset, we differentiate between a field that is explicitly set to null and an absent field. If an absent field is selected we return a special type undefined, which behaves like null in almost all situations. IS NULL will return true for both null and undefined. To distinguish between the two you can use a special predicate IS UNDEFINED, which returns true for undefined and false for null.

Hence, the following query looks for documents where the field foo is set to the JSON value null:

> SELECT * FROM collection WHERE foo IS NULL AND foo IS NOT UNDEFINED

On the other hand, the following query looks for documents where the field foo does not exist:

> SELECT * FROM collection WHERE foo IS UNDEFINED

What does ‘serverless’ mean in the conext of Rockset?

Rockset is delivered as a managed service in the cloud. It abstracts away the infrastructure and operational considerations of a data management platform. It does not require you to provision machines or instances in the cloud in advance and can use cloud elasticity to handle scaling automatically for you.

What is a Smart Schema?

Field types are auto-inferred by Rockset at the time of insert and exposed as a Smart Schema to enable relational SQL queries, without upfront data modeling or schema design. By associating field types with every occurrence of a field value, Rockset is able to support both dynamic and strong typing. This means writes are never rejected, so there is no data loss even if there are mixed types associated with a particular field.

Smart Schema

Rockset provides atomic writes at the document level - you can update multiple fields within a single document atomically. Rockset does not support atomic updates across documents.

How does Rockset compare with Elastic Search?

There are significant differences between Rockset and Elastic Search (ES):

  • SQL (with joins). Rockset can be queried using standard SQL with filters, aggregations and joins. Additionally, it includes custom extensions to the SQL interface to allow for easy querying of nested documents and arrays. ElasticSearch requires the use of Query DSL and does not support joins.

  • Scalability. Rockset has a cloud-scale architecture. It can scale both compute and storage independent of one another. One one hand, you can have a small data set served by zillions of compute in parallel to make queries faster. On the other hand, you can have petabyte size data sets served by a small number of compute nodes. And, of course, you can have the entire spectrum in-between these two scenarios. This is achieved by leveraging the elasticity of compute and storage that is available in the cloud. When using ES, you have to pre-configure the number of shards of the index at index-creation time. The only way to increase this number is to delete the entire index and recreate it. Rockset handles index growth smoothly without needing to delete/recreate the entire index.

  • Isolation. An ES cluster typically serves multiple indices. ES supports a few elementary controls to tune the resources consumed to serve each index. A burst of activity on one index usually causes service degradation for serving the other indices in the same cluster. Rockset’s Tailer-Leaf-Aggregator architecture avoids this situation by having seperate serving tiers for each index while sharing the storage needed to make the index durable.

  • Automatic placement of hot, warm and cold data. Rockset leverges RAM, SSD, spinning disks, and AWS S3 to store the index data. Most frequently and recently accessed data remains in RAM while rarely accessed data migrates to residing in slower and cheaper storage like AWS S3. This is achieved by analyzing access patterns of the workload. One the other hand, ES has a hot-cold feature but needs manual configuration and placement to specify what portion of data is hot and what is cold.

  • Two level Aggregation. Rockset has a two level aggregator hierarchy to support queries that need aggregation across large result sets. ES does not have an equivalent feature to achieve this.

  • Availability. Rockset’s tailer-leaf-aggregator architecture runs the term generators in separate processes called the Tailers and the index-servers just tail data from the Tailers. This means that the serving system is immune to hiccups in term-generators. On the other hand, ES’s datanodes run the term-generators on the same set of servers that serve incoming queries. Any hiccups in term-generation invariably affects the availability of the ES’s serving system.

  • C++ vs Java. Rockset’s critical components that are in the fast-path are developed in modern C++1z, which leads to a more efficient system compared to Java-based ES servers. Rockset uses RocksDB as the underlying storage engine, and RocksDB is optimized for fast storage like RAM, NvMe and SSD hardware.