Overview > FAQ

FAQ

This page contains answers to common questions about Rockset. If your question is not answered here, please reach out to Rockset’s support team at support@rockset.com and we will be happy to help.

How do I get a Rockset account?

Please contact us at hello@rockset.com if you are interested in a Rockset account.

What is strong dynamic typing?

Rockset has strong dynamic typing, where the data type is associated with the value of the field in every column, rather than entire columns. Rockset does not do any implicit type conversion at query time or write time. The original type information of every field is preserved along with every field value and are respected at query processing time.

If the same field contains different types, then Rockset’s smart schema exposes it for you and lets you execute strongly typed queries on this dynamically typed data. Consider the following example:

$ rock create collection people
$ echo '{"age": "a lady never tells"}' | rock upload people -
$ echo '{"age": 42}' | rock upload people -
$ echo '{"age": 3.142}' | rock upload people -
$ rock sql 'DESCRIBE people'
+---------+--------+
| path    | type   |
|---------+--------|
| ['age'] | int    |
| ['age'] | float  |
| ['age'] | string |
+---------+--------+

Now, a query SELECT * FROM people WHERE age >= 18 looks for all documents in which the following three conditions are true:

  • field age is defined in the document
  • field age has an integer or float value
  • field age's numeric value is greater than or equal to 18

If some documents in the collection, due to errors or bugs, does not contain field age or the field age has a string or null value, all such documents will not be returned for the query.

Note that in this case, the query considers both integer and float types because the query literal is numeric. In all other cases, the query considers only the type matching that of the query literal. For example, the query SELECT * FROM people WHERE "last_name" = 'Gray' matches the type string and value 'Gray'.

How does a null field in a JSON document get parsed?

In SQL, the concept of NULL refers to the absence of a field. Because Rockset supports ingestion of JSON documents that may have fields with the JSON value of null, a new token NULL_VALUE is designated for this case.

Hence, the following query looks for documents where the field foo is set to the JSON value null:

> SELECT * FROM collection WHERE foo = NULL_VALUE

On the other hand, the following query looks for documents where the field foo does not exist:

> SELECT * FROM collection WHERE foo IS NULL

How does Rockset compare with Elastic Search?

There are significant differences between Rockset and Elastic Search (ES):

  • SQL (with joins). Rockset can be queried using standard SQL with filters, aggregations and joins. Additionally, it includes custom extensions to the SQL interface to allow for easy querying of nested documents and arrays. ElasticSearch requires the use of Query DSL and does not support joins.

  • Scalability. Rockset has a cloud-scale architecture. It can scale both compute and storage independent of one another. One one hand, you can have a small data set served by zillions of compute in parallel to make queries faster. On the other hand, you can have petabyte size data sets served by a small number of compute nodes. And, of course, you can have the entire spectrum in-between these two scenarios. This is achieved by leveraging the elasticity of compute and storage that is available in the cloud. When using ES, you have to pre-configure the number of shards of the index at index-creation time. The only way to increase this number is to delete the entire index and recreate it. Rockset handles index growth smoothly without needing to delete/recreate the entire index.

  • Isolation. An ES cluster typically serves multiple indices. ES supports a few elementary controls to tune the resources consumed to serve each index. A burst of activity on one index usually causes service degradation for serving the other indices in the same cluster. Rockset’s Tailer-Leaf-Aggregator architecture avoids this situation by having seperate serving tiers for each index while sharing the storage needed to make the index durable.

  • Automatic placement of hot, warm and cold data. Rockset leverges RAM, SSD, spinning disks, and AWS S3 to store the index data. Most frequently and recently accessed data remains in RAM while rarely accessed data migrates to residing in slower and cheaper storage like AWS S3. This is achieved by analyzing access patterns of the workload. One the other hand, ES has a hot-cold feature but needs manual configuration and placement to specify what portion of data is hot and what is cold.

  • Two level Aggregation. Rockset has a two level aggregator hierarchy to support queries that need aggregation across large result sets. ES does not have an equivalent feature to achieve this.

  • Availability. Rockset’s tailer-leaf-aggregator architecture runs the term generators in separate processes called the Tailers and the index-servers just tail data from the Tailers. This means that the serving system is immune to hiccups in term-generators. On the other hand, ES’s datanodes run the term-generators on the same set of servers that serve incoming queries. Any hiccups in term-generation invariably affects the availability of the ES’s serving system.

  • C++ vs Java. Rockset’s critical components that are in the fast-path are developed in modern C++1z, which leads to a more efficient system compared to Java-based ES servers. Rockset uses RocksDB as the underlying storage engine, and RocksDB is optimized for fast storage like RAM, NvMe and SSD hardware.