This page contains answers to common questions about Rockset. If your question is not answered here, please reach out to Rockset’s support team at firstname.lastname@example.org and we will be happy to help.
Please navigate to the Rockset Console in order to get started with Rockset.
Rockset has strong dynamic typing, where the data type is associated with the value of the field in every column, rather than entire columns. Rockset does not do any implicit type conversion at query time or write time. The original type information of every field is preserved along with every field value and are respected at query processing time.
If the same field contains different types, then Rockset’s smart schema exposes it for you and lets you execute strongly typed queries on this dynamically typed data. Consider the following example table:
+---------+--------+ | path | type | |---------+--------| | ['age'] | int | | ['age'] | float | | ['age'] | string | +---------+--------+
Now, a query
SELECT * FROM people WHERE age >= 18 looks for all documents in which the following three conditions are true:
ageis defined in the document
agehas an integer or float value
age's numeric value is greater than or equal to 18
If some documents in the collection, due to errors or bugs, does not contain field
age or the field
age has a string or null value, all such documents will not be returned for the query.
Note that in this case, the query considers both integer and float types because the query literal is numeric. In all other cases, the query considers only the type matching that of the query literal. For example, the query
SELECT * FROM people WHERE "last_name" = 'Gray' matches the type
string and value
In Rockset, we differentiate between a field that is explicitly set to
null and an absent field. If an absent field is selected we return a special type
undefined, which behaves like
null in almost all situations.
IS NULL will return true for both
undefined. To distinguish between the two you can use a special predicate
IS UNDEFINED, which returns
Hence, the following query looks for documents where the field
foo is set to the JSON value
> SELECT * FROM collection WHERE foo IS NULL AND foo IS NOT UNDEFINED
On the other hand, the following query looks for documents where the field
foo does not exist:
> SELECT * FROM collection WHERE foo IS UNDEFINED
Rockset is delivered as a managed service in the cloud. It abstracts away the infrastructure and operational considerations of a data management platform. It does not require you to provision machines or instances in the cloud in advance and can use cloud elasticity to handle scaling automatically for you.
Field types are auto-inferred by Rockset at the time of insert and exposed as a Smart Schema to enable relational SQL queries, without upfront data modeling or schema design. By associating field types with every occurrence of a field value, Rockset is able to support both dynamic and strong typing. This means writes are never rejected, so there is no data loss even if there are mixed types associated with a particular field.
Rockset provides atomic writes at the document level - you can update multiple fields within a single document atomically. Rockset does not support atomic updates across documents.
There are significant differences between Rockset and Elastic Search (ES):
SQL (with joins). Rockset can be queried using standard SQL with filters, aggregations and joins. Additionally, it includes custom extensions to the SQL interface to allow for easy querying of nested documents and arrays. ElasticSearch requires the use of Query DSL and does not support joins.
Scalability. Rockset has a cloud-scale architecture. It can scale both compute and storage independent of one another. One one hand, you can have a small data set served by zillions of compute in parallel to make queries faster. On the other hand, you can have petabyte size data sets served by a small number of compute nodes. And, of course, you can have the entire spectrum in-between these two scenarios. This is achieved by leveraging the elasticity of compute and storage that is available in the cloud. When using ES, you have to pre-configure the number of shards of the index at index-creation time. The only way to increase this number is to delete the entire index and recreate it. Rockset handles index growth smoothly without needing to delete/recreate the entire index.
Isolation. An ES cluster typically serves multiple indices. ES supports a few elementary controls to tune the resources consumed to serve each index. A burst of activity on one index usually causes service degradation for serving the other indices in the same cluster. Rockset’s Tailer-Leaf-Aggregator architecture avoids this situation by having separate serving tiers for each index while sharing the storage needed to make the index durable.
Automatic placement of hot, warm and cold data. Rockset leverages RAM, SSD, spinning disks, and AWS S3 to store the index data. Most frequently and recently accessed data remains in RAM while rarely accessed data migrates to residing in slower and cheaper storage like AWS S3. This is achieved by analyzing access patterns of the workload. One the other hand, ES has a hot-cold feature but needs manual configuration and placement to specify what portion of data is hot and what is cold.
Two level Aggregation. Rockset has a two level aggregator hierarchy to support queries that need aggregation across large result sets. ES does not have an equivalent feature to achieve this.
Availability. Rockset’s tailer-leaf-aggregator architecture runs the term generators in separate processes called the Tailers and the index-servers just tail data from the Tailers. This means that the serving system is immune to hiccups in term-generators. On the other hand, ES’s datanodes run the term-generators on the same set of servers that serve incoming queries. Any hiccups in term-generation invariably affects the availability of the ES’s serving system.
C++ vs Java. Rockset’s critical components that are in the fast-path are developed in modern C++1z, which leads to a more efficient system compared to Java-based ES servers. Rockset uses RocksDB as the underlying storage engine, and RocksDB is optimized for fast storage like RAM, NvMe and SSD hardware.