Compute Architecture
Compute-Compute Separation
Compute-compute separation is an architecture that isolates the compute used for ingestion from the compute used for queries, allowing you to scale them independently. Multiple virtual instances can access the same, real-time datasets, and virtual instances can be quickly spun up or down to accommodate variable workloads.
This means you can efficiently scale Rockset by changing the sizes of your virtual instances or by spinning up new ones, while guaranteeing data freshness across all virtual instances.
Motivation
Compute-compute separation fundamentally addresses the issue of compute contention in a real-time database system. While having data ingestion and queries running on the same compute unit preserves the real-time nature of the data system because reads can easily reflect recent writes, this comes with the caveat that a spike in writes will result in degraded query performance and vice-versa. In addition, queries from different applications are not isolated from each other and can negatively impact one another. Effectively, in a non compute-compute separated architecture, the read and write workloads for the data system must contend for the same set of compute resources.
Rockset’s compute-compute separated architecture completely isolates the compute resources used for data ingestion from the ones used for serving queries. A given virtual instance can be dedicated to data ingestion, dedicated to queries, or responsible for both workloads. Multiple virtual instances can be used to isolate query workloads for different applications.
Architecture
Rockset’s architecture is designed to maintain the real-time nature of data even when ingestion and queries are not running on the same compute unit.
Rockset maintains data freshness and consistency across multiple virtual instances. One virtual instance (the streaming ingest virtual instance) performs the CPU intensive work associated with ingestion, and all other virtual instances (query virtual instances) keep their in-memory state up-to-date by tailing updates from the streaming ingest virtual instance. No on-disk state needs to be replicated because it is available to all virtual instances through the shared hot storage layer.
The CPU intensive work associated with ingestion includes:
- Parsing input documents
- Performing Ingest Transformations
- Handling updates to existing documents
- Indexing
- Compaction
The design for compute-compute separation ensures only the streaming ingest virtual instance handles the CPU intensive work associated with ingestion.
A shared hot storage tier enables compute-storage separation in the system, which is a pre-requirement for compute-compute separation. Compute-storage separation means that no data movement is required to spin up a new virtual instance, making the operation fast, lightweight.
By integrating both compute-storage separation and compute-compute separation, Rockset allows you to take full advantage of the elasticity of the cloud for your real-time analytics workloads.
FAQs
How do I decide if I should increase the size of my virtual instance or add an additional virtual instance?
Configuring multiple virtual instances is useful for isolating compute resources for different workloads. However, to achieve the latency and throughput requirements for ingestion and queries of a specific workload, you will want to increase the size of your virtual instance.
What is the data replication lag between the streaming ingest virtual instance and the query virtual instances
To effectively power real-time data analytics, the data replication lag between the streaming ingest virtual instance and the query virtual instances is very low (on the order of tens of milliseconds).
How does a compute-compute separated architecture provide availability?
Compute-compute separation allows you to isolate your data ingestion workload from your query workloads. This improves reliability because different workloads are running on separate compute resources, preventing noisy neighbor issues. Furthermore, within a single virtual instance, Rockset maintains availability in the face of partial failures.
What are other resources for learning more about compute-compute separation?
We have some blog posts that discuss compute-compute separation and how you can get started with using it yourself: Introducing Compute-Compute Separation and Compute-Compute Separation Overview.
We also have a technical deep dive on our compute-storage and compute-compute separated architecture, which you can view here.
Updated 1 day ago