Load Testing

Load Testing aims to assess the system’s behavior under both normal and peak conditions. This is crucial for evaluating key metrics such as Queries Per Second (QPS), concurrency, and query latency. Understanding these metrics is essential for appropriately sizing your Virtual Instances correctly and guaranteeing their capability to manage anticipated workloads, meet Service Level Agreements (SLAs), and ensure a smooth, uninterrupted user experience.

💡

Load Testing Tip

We recommend load testing on at least two Virtual Instances: one dedicated to ingestion (Ingest VI) and one dedicated to queries (Query VI). This will help with deciding on a single or multi-VI architecture.

Load Testing on Rockset

In order to conduct a proper load test, we will need to simulate a heavy workload. Rockset offers a Rockset API for query execution. External tools can be utilized to interact seamlessly with these endpoints. Below, we've compiled a list of widely used load testing tools:

  • JMeter: A versatile and user-friendly tool with a GUI, ideal for various types of load testing, but can be resource-intensive.
  • k6: Optimized for high performance and cloud environments, using JavaScript for scripting, suitable for developers and CI/CD workflows.
  • Gatling: High-performance tool using Scala, best for complex, advanced scripting scenarios.
  • Locust: Python-based, offering simplicity and rapid script development, great for straightforward testing needs.

Whichever tool you use, be sure to read through the documentation and understand how it works and how it measures the latencies/response times. Try not to mix and match tools in your testing - sticking with the same tool will ensure you get reproducible and trustworthy results that you can share with your team or stakeholders.

For this guide, we will focus on load testing Rockset with Locust.

📘

Useful Resources

Here are some useful resources for JMeter, Gatling and k6. The process is very similar to what we’re doing with Locust.

Step 1: Identify the Query to Load Test

First, we will need to identify a sample SQL query that we want to load test. In our example, we'll use a query that finds the most popular product on our webshop for a particular day. This is what our SQL query looks like (note that :date is a parameter which we can supply when executing the query):

SELECT  
    s.Date,  
    MAX_BY(p.ProductName, s.Count) AS ProductName,  
    MAX(s.Count) AS NumberOfClicks  
FROM  
    "Demo-Ecommerce".ProductStatsAlias s  
    INNER JOIN "Demo-Ecommerce".ProductsAlias p ON s.ProductID = CAST(p._id AS INT)  
WHERE  
    s.Date = :date  
GROUP BY  
    1  
ORDER BY  
    1 DESC;

💡

Load Testing Tip

Enable Query Logs in Rockset to keep statistics of all executed queries.

Step 2: Save Query as a Query Lambda

Next, we'll want to save the above query as a Query Lambda - this makes it very easy to test that SQL query as a REST endpoint. Query Lambdas can be parametrized and the SQL can be versioned allowing for easier query management.

We’ll save this Query Lambda with the name LoadTestQueryLambda in the sandbox workspace which can then be accessed using the Execute Query Lambda By Tag endpoint:

https://api.use1a1.rockset.com/v1/orgs/self/ws/{workspace}/lambdas/{queryLambda}/tags/{tag}

The sample cURL request looks something like this:

curl --request POST \
--url https://api.usw2a1.rockset.com/v1/orgs/self/ws/sandbox/lambdas/LoadTestQueryLambda/tags/latest \
-H "Authorization: ApiKey $ROCKSET_APIKEY" \
-H 'Content-Type: application/json' \
  -d '{
    "parameters": [
      {
        "name": "days",
        "type": "int",
        "value": "1"
      }
    ],
      "virtual_instance_id": "<your virtual instance ID>"
  }'

Step 3: Generate an API key

Now, we need to generate an API key, which we’ll use as a way for our load testing script to authenticate itself to Rockset and run the test. You can create an API key easily by navigating to the "API Keys" tab in the Rockset console or by using the Create API Key endpoint.

Step 4: Create a VI for Load Testing

Next, we need the ID of the Virtual Instance we want to load test in. In our example, we want to run a load test against a Rockset Virtual Instance that’s dedicated only to querying (Query VI). We spin up an additional Medium Virtual Instance for this by navigating to the Virtual Instances tab in the Rockset console or by using the Create Virtual Instance endpoint. More information on how to spin up additional Query VIs can be found in our Multiple Virtual Instances documentation. Once the Query VI is created, we can get its ID from the console:

Step 5: Install an External Load Testing Tool

Next, we’ll install and set up any of the external load testing tools listed earlier. In this example, we'll use Locust. You can install Locust on your local machine or a dedicated instance (think EC2 in AWS).

$ pip install locust

Step 6: Create a Load Test Script

Once that’s done, we’ll create a Python script for the Locust load test. We can use the script below as a template. Note, you will need to update the following variables:

  • ROCKSET_APIKEY with your API key from step 3
  • self.host with your region's URL (click here for more information)
  • self.vi_id with your Query Virtual Instance ID from step 4
  • target_service with your updated endpoint from step 2
import os
from locust import HttpUser, task, tag
from random import randrange

class query_runner(HttpUser):
    ROCKSET_APIKEY = os.getenv('ROCKSET_APIKEY') # API key is an environment variable

    header = {"authorization": "ApiKey " + ROCKSET_APIKEY}

    def on_start(self):
        self.headers = {
            "Authorization": "ApiKey " + self.ROCKSET_APIKEY,
            "Content-Type": "application/json"
        }
        self.client.headers = self.headers
        self.host = 'https://api.usw2a1.rockset.com/v1/orgs/self' # replace this with your region's URL
        self.client.base_url = self.host
        self.vi_id = '<your virtual instance ID>' # replace this with your VI ID

    @tag('LoadTestQueryLambda')
    @task(1)
    def LoadTestQueryLambda(self):
        # using default params for now
        data = {
            "virtual_instance_id": self.vi_id
        }
        target_service = '/ws/sandbox/lambdas/LoadTestQueryLambda/tags/latest' # replace this with your query lambda
        result = self.client.post(
            target_service,
            json=data
        )

Step 7: Run the Load Test

Once we set the API key environment variable, we can run the Locust environment:

export ROCKSET_APIKEY=<your api key>
locust -f my_locust_load_test.py --host https://api.usw2a1.rockset.com/v1/orgs/self

And navigate to: http://localhost:8089 where we can start our Locust load test:

Let’s explore what happens once we hit the Start swarming button:

  1. Initialization of simulated users: Locust starts creating virtual users (up to the number you specified) at the rate you defined (the spawn rate). These users are instances of the user class defined in your Locust script. In our case, we’re starting with a single user but we will then manually increase it to 5 and 10 users, and then go down to 5 and 1 again.
  2. Task execution: Each virtual user starts executing the tasks defined in the script. In Locust, tasks are typically HTTP requests, but they can be any Python code. The tasks are picked randomly or based on the weights assigned to them (if any). We have just one query that we’re executing (our LoadTestQueryLambda).
  3. Performance metrics collection: As the virtual users perform tasks, Locust collects and calculates performance metrics. These metrics include the number of requests made, the number of requests per second, response times, and the number of failures.
  4. Real-time statistics update: The Locust web interface updates in real-time, showing these statistics. This includes the number of users currently swarming, the request rate, failure rate, and response times.
  5. Test scalability: Locust will continue to spawn users until it reaches the total number specified. It ensures the load is increased gradually as per the specified spawn rate, allowing you to observe how the system performance changes as the load increases. You can see this in the graph below where the number of users starts to grow to 5 and 10 and then go down again.
  6. User behavior simulation: Virtual users will wait for a random time between tasks, as defined by the wait_time in the script. This simulates more realistic user behavior. We didn’t do this in our case but you can do this and more advanced things in Locust like custom load shapes, and so on.
  7. Continuous test execution: The test will continue running until you decide to stop it, or until it reaches a predefined duration if you've set one.
  8. Resource utilization: During this process, Locust utilizes your machine's resources to simulate the users and make requests. It's important to note that the performance of the Locust test can also depend on the resources of the machine it's running on.

Step 8: Analyze the Load Test results

Interpreting results from a Locust run involves understanding key metrics and what they indicate about the performance of the system under test. Here are some of the main metrics provided by Locust and how to interpret them:

  • Number of users: The total number of simulated users at any given point in the test. This helps you understand the load level on your system. You can correlate system performance with the number of users to determine at what point performance degrades.
  • Requests per second (RPS): The number of requests (queries) made to your system per second. A higher RPS indicates a higher load. Compare this with response times and error rates to assess if the system can handle concurrency and high traffic smoothly.
  • Response time: Usually displayed as average, median, and percentile (e.g., 90th and 99th percentile) response times. You will likely look at median and the 90/99 percentile as this gives you the experience for “most” users - only 10 or 1 percent will have worse experience.
  • Failure rate: The percentage or number of requests that resulted in an error. A high failure rate indicates problems with the system under test. It's crucial to analyze the nature of these errors.

📘

For more information and detailed diagrams, check out our in-depth blog on How to Do Load Testing with Rockset and Locust.

Apart from viewing these metrics in the Rockset console or through our metrics endpoint, you can also interpret and analyze the actual SQL queries that were running, what was their individual performance, queue time, and so on. To do this, we must first enable Query Logs and then we can do things like this to figure out our median run and queue times:

SELECT
    query_sql,
    COUNT(*) as count,
    ARRAY_SORT(ARRAY_AGG(runtime_ms)) [(COUNT(*) + 1) / 2] as median_runtime,
    ARRAY_SORT(ARRAY_AGG(queued_time_ms)) [(COUNT(*) + 1) / 2] as median_queue_time
FROM
    commons."QueryLogs"
WHERE
    vi_id = '<your query virtual instance ID>'
    AND _event_time > TIMESTAMP '2023-11-24 09:40:00'
GROUP BY
    query_sql

We can repeat this load test on your Ingest VI as well to see how the system performs ingestion and runs queries under load. The process would be the same, we would just use a the Ingest VI's identifier in our Locust script in Step 6.

Strategies for Better Performance

If you need better performance, there are several strategies on how to tackle this. Choosing an appropriate Query Virtual Instance size depends on your query complexity, dataset size, and selectivity of your queries, number of queries that are expected to run concurrently and target query performance latency. For your Ingest VI, you should factor in resources needed to handle ingestion and indexing. Luckily, we offer two features that can help with this:

  • Auto-Scaling: with this feature, Rockset will automatically scale the VI up and down depending on the current load. This is important if you have some variability in your load and/or use your Ingest VI to do both ingestion and querying.
  • Compute-Compute Separation: this is useful because you can create Query VIs that are dedicated solely for running queries and this ensures that all of the available resources are geared towards executing those queries efficiently. This means you can isolate queries from ingest or isolate different apps on different Query VIs to ensure scalability and performance.

📘

Check out our docs on Query Performance and Ingest Performance for more performance improving tips!