Micro-Batching

📘

Micro-Batching is in beta. The documentation is subject to change.

Micro-Batching is a configuration that can be enabled for your ingest Virtual Instance. When micro-batching is enabled, your Virtual Instance will automatically suspend when your ingest has “caught up” (i.e. when ingest latency is near zero) and resume on a specified interval. You may also have an active query Virtual Instance that allows you to serve a query workload on mounted collections, even while your ingest Virtual Instance is suspended.

Enabling micro-batching allows you to maximize cost efficiency and performance of Rockset by trading off cost with ingest latency. Suspended Virtual Instances do not incur compute costs.

By enabling micro-batching, your Virtual Instance will cyclically:

  • Suspend when the average document detection latency across your VI over the last 5 minutes is below 60 seconds. This indicates that your Virtual Instance has “caught up” and is processing documents that were recently inserted into your source.
    • Note: Document detection latency is the time it takes for a document inserted or updated in the source to be detected by Rockset. This is different from ingest latency, which is the time it takes for a document inserted or updated in the source to be queryable. Ingest latency includes detection latency plus processing latency. You can see your ingest latency on the Metrics page of the Rockset console, as well as the detection and processing latency per collection in the "Metrics" tab on the collection details page.
  • Resume on an interval specified by you. For example, if you specify a resume interval of 30 minutes, your Virtual Instance will resume 30 minutes after it was suspended. You may also resume your Virtual Instance manually at any time.

💡

Micro-Batching Tip

We recommend enabling micro-batching for your Virtual Instance if:

  • You would like to switch to a more cost-efficient Multi-VI architecture and you don’t mind having higher ingest latency for lower cost.
  • Your ingest is periodic or sporadic, and you don’t need your ingest Virtual Instance to immediately pick up your ingest workload.

Using Micro-Batching

Setting a Resume Interval

Consider the desired maximum tolerable ingest latency — the resume interval is correlated with this value. For example, if you select a resume interval of 60 minutes, your ingest latency should hover around 60 minutes while your VI is catching up on ingest. Resume intervals must be between 10 minutes and 2 hours.

Note that the time it takes for your VI to resume (i.e. RESUMING state) is not included in the resume interval.

Handling Unexpectedly High or Increasing Ingest Latency

If your ingest latency is unexpectedly high, e.g. you set a resume interval of 30 min but your ingest latency maxes out at 1 hour, we may need to tune micro-batching to your ingest workload. Please reach out to Rockset Customer Support with details about your issue.

If your ingest latency is continuously increasing, it’s likely that your Virtual Instance is unable to keep up with the rate of ingest. In other words, your document detection latency is low, but your document processing latency is high, and you may need to scale up to a larger Virtual Instance. To avoid needing to manually scale up your Virtual Instance in these scenarios, we recommend setting an auto-scaling policy on your VI. Note that your VI will not suspend until the ingest backlog is cleared.

Multi-VI Architecture

Using micro-batching requires you to build a multi-VI architecture in Rockset. We recommend having at least one query Virtual Instance that has auto-suspend disabled when micro-batching is enabled for your ingest Virtual Instance.

🚧

You cannot enable micro-batching on a default query Virtual Instance, and queries cannot be routed to a micro-batching VI by default.

Live Mounts

Live mounts can be created while your ingest Virtual Instance is suspended. Both created and existing mounts will be queryable for data inserted up to the point of suspension (minus any ingest latency).

Miscellaneous

Virtual Instances must be resumed for a minimum of 10 minutes before they are re-suspended.

Restrictions

While your ingest VI is suspended, you will be unable to:

  • Create a collection
  • Ingest from IIS queries (queries will run to completion)
  • Send Write API requests
  • Create snapshots
  • Receive updates to _events, ingest logs, or query logs

Additionally:

  • While your Ingest Virtual Instance is suspended, you cannot disable Micro-Batching. You can disable Micro-Batching after resuming your Ingest Virtual Instance.
  • Your VI will not suspend until all collections and mounts are READY and any ongoing bulk ingests and IIS queries have run to completion.
  • While your ingest VI is suspended, you will not receive live ingest metrics in console or metrics endpoint. Ingest metrics recorded while the ingest VI is suspended will be 0. However, metrics that were recorded while the VI was ACTIVE will still be available.
  • Indexes that are building will not progress until your ingest VI has resumed.

Examples of Micro-Batching Configurations

Acme Corp

  • Ingest VI: SMALL
  • Query VI(s): MEDIUM
  • Ingest Rate: 1 MiB/s
  • SMALL Peak Ingest Rate: 3 MiB/s
  • Micro-Batching Resume Interval: 15 minutes
  • Result: The ingest VI suspends for 15 minutes and resumes for about 5 minutes, processing a data backlog of 900 MiB at 3 MiB/s. This allows Acme Corp to save approximately 75% on ingest compute costs.

Stark Industries

  • Ingest VI: LARGE
  • Query VI(s): LARGE, LARGE
  • Ingest Rate: 15 MiB/s
  • LARGE Peak Ingest Rate: 18 MiB/s
  • Micro-Batching Resume Interval: 2 hours
  • Result: After a 2 hour suspension, there is a data backlog of 108 GiB. Once resumed, the VI ingests at 18 MiB/s, and the backlog is processed over 100 minutes. Overall, Stark Industries saves about 42% on ingest compute costs.

You can find the peak ingest rates for each VI size here.