Data Storage in Rockset
Rockset uses RocksDB, an open source key-value store, to store your data. RocksDB is widely used in many storage systems that require high performance and low latency access. It has become the storage engine of choice for many database management systems, including MySQL, Apache Kafka and CockroachDB.
The size of your data compressed and indexed in RocksDB is the value used for usage tracking and billing purposes.
#Understanding Data Storage Size
#Storage for Converged Indexing
The total size of your data in Rockset is determined by the cumulative size of the indices Rockset builds on top of your data. Rockset’s Converged Indexing technology indexes each document and stores the indices as a set of key-value pairs inside a RocksDB storage engine. This format is optimized for fast query serving. The Converged Indexing process indexes every field of your document, including nested objects and array entries. Every field is indexed at least three ways:
- an inverted index useful for point lookups
- a columnar index useful for aggregations
- a row index useful for data retrieval
- a range index useful for range scans that have low selectivity
You also have the option of creating specialized geo-indexes by configuring Rockset’s Field Mappings. The benefits of building these indices are obvious: it makes all your queries fast. And you do not have to manually create and tune these indices.
Rockset stores a type with every individual value of a field unlike other systems that typically associate a type with every field. The Converged Indexing format indexes each field type of your dataset. The same field in different records can contain data of different types and your queries can use TYPEOF function to filter values of a certain type. The advantage of storing the type with each value is that you can avoid doing data cleaning and filtering prior to putting data into Rockset.
So, what is the downside of creating these indices and storing individual types? One common perception is that every new index that you create will bloat your storage size. That perception is not true anymore, especially when you store your data in RocksDB.
#Data Compaction in RocksDB
Given that each document field value is stored in several indexes, you might expect to see a significant size amplification between your data natively and your data in Rockset. But this is not often the case. RocksDB takes a set of key-values in a BlockBasedTableFormat and compresses it before storing it on disk. RocksDB allows storing different portions of the data to be stored using different compression technologies, e.g. frequently used data can be compressed using lz4 while less-frequently used data can be compressed using zstd or gzip. Also, RocksDB does delta-encoding of keys, so that keys that have common prefixes do not incur the cost of duplicating those key-prefixes on storage. Another way that RocksDB reduces storage bloat is by supporting bloom-filters on prefixes of keys rather than storing a bloom filter for every key.
The inverted index is organized as key-values of posting lists and Rockset uses prefix-compression and Elias-Fano encoding to reduce the size of these lists. The columnar index is chunked into groups and a group stores the values of a single field in sorted order. This columnar grouping reduces storage size by using delta encoding of values.
#Data Storage FAQ
#Do datasets with sparse fields increase the size of my index?
Rockset is geared to support sparse fields without any storage size amplification. In other database systems, an index on a dataset which has hundreds of sparse fields may inflate the total size of the database because those indices have to record some metadata about each of those fields even if many of those fields do not exist on a specific record. Rockset’s data format is designed to not incur even a single byte overhead for any of these sparse fields.
#Why is my data in Rockset larger in size than storing it in Parquet?
The RocksDB data format is optimized for indexing whereas Parquet and other warehousing technologies are optimized for scanning. If you want low latency query on a large dataset, the RocksDB format can find you the relevant records very quickly because it uses the indices that narrow down the search space. On the other hand, if you store data in Parquet, your query processing software will have to scan large sections of your dataset to find your matching records. So, when your data set size grows and you are storing data in Parquet, you either have to use a lot of compute to scan the data set in parallel for every query or you have to live with slower queries. Parquet is a readonly data format, which means that records cannot be updated in place. This fits nicely in a warehousing use-case, where most of your data is readonly and new data gets stored in new partitions on your storage. This allows the Parquet format to use a very compact data format. On the other hand, the RocksDB data format is a mutable data format, which means that you can update, overwrite or delete individual fields in a record at will.
In short, if you want interactive queries on large datasets, then the RocksDB-based data format is your optimal choice.
#Can I switch off any of the indices to reduce my Rockset data size?
Rockset builds indices for every field in your document. This allows you to make search, aggregation and join queries and you can expect those queries to be fast without you having to worry about manually creating appropriate indices and managing them. But if the RocksDB size is a concern for you, then one way to reduce the size is to eliminate those fields that your queries never use. You can drop specific fields from being indexed by configuring this in Field Mappings. Those dropped fields will not be indexed and stored in the RocksDB data. Both whitelisting or blacklisting of fields are supported via FieldMappings.
#Why is my data size in Rockset different from the size of my original data source?
The RocksDB based data format is optimized for low latency queries while keeping data latencies low.
#Can I reduce my Rockset bytes by doing Life Cycle Management of data?
If you are interested in reducing the size of your dataset, a good option is to set a retention duration for your data. Data that falls outside the retention window is automatically removed.
#Why is my data size in Rockset fluctuating? Am I losing data?
Rockset is designed to reduce the data latency of your data. If you write data to Rockset, it is visible to queries within a few seconds. If these writes were updates to existing records in the storage, then RocksDB’s background compaction algorithms merges the different versions of a field and retains only the most recent version of the value of any field. The reason the merging happens asynchronously in the background is because the system is designed to make newly written data visible to queries even before it is merged with existing records. The upside of this approach is that it reduces the data-latency of the system. A minor downside is that you may see your data size grow and shrink lazily if you have bursty update patterns. This is not a problem because RocksDB’s leveled compaction keeps the data size fluctuations within a margin of 10%.
#What is my data size when bulk-loading data?
When you create a collection from a data source that has a non-trivial amount of data, the Rockset system employs a bulk-load mechanism. The bulk-load mechanism is optimized to load the data set in minimum time, and during this time the RocksDB Bytes could be much larger than your expected data size. Rest assured that you are not billed for storage for the duration when the collection is in BulkMode.