Get started today
Replay past traffic, gain confidence in optimizations, and elevate performance.

Measuring throughput and latency is a critical step in load testing software, and in ensuring application performance and stability. In this article, you’ll learn key considerations, plus get a step-by-step guide for using production traffic replication in Kubernetes for determining maximum throughput when performance testing your software.


What is throughput in performance testing?

Throughput is a measure of how many requests your application can handle over a period of time, and is often measured in transactions per second (TPS). A proper load testing and stress testing plan typically tracks throughput, along with other performance metrics like average response time, concurrent users, and error rate. Understanding and testing throughput gives you a better idea of how much network bandwidth you need in order to accommodate your users, and ensures your web application has the capacity to handle expected or peak load.


The importance of measuring max TPS

Understanding your software’s maximum TPS helps engineering managers gain confidence in the system’s ability to avoid unexpected downtime and network latency.

Determining max TPS also helps engineers and engineering managers decide on which optimizations to prioritize or what changes are needed to accommodate future changes or events. Running tests to determine max TPS will reveal whether the infrastructure and applications can handle this and what parts may require optimizations. An expected 20% growth may have teams modify auto scaling rules to spin up 20% more instances, which neglects to verify whether new instances can spin up fast enough.

The many factors influencing scale-up time and readiness require thorough testing to simulate what will happen in production. This testing also creates an understanding of application behavior, removing the fear of spikes, and allowing you to size pods with appropriate headroom to avoid overprovisioning–and to reduce costs.


Key considerations for determining max throughput

When executing load tests and determining your max TPS, it’s vital to consider factors such as ramp patterns, sustained TPS, and spike TPS.

Ramp patterns

Setting an improper ramp time will invalidate the accuracy of any TPS test as application behavior can vary wildly depending on how traffic ramps up or tails down, with different failures showing up at different load patterns.

Consider Ticketmaster putting new tickets on sale. Only validating the case of selling tickets for a popular artist can lead to overly sensitive autoscaling rules, leading to overprovisioning. Conversely, only testing for lesser-known artists may lead to insufficient autoscaling.

Testing different ramp patterns can help determine whether different underlying resources need to change for different scenarios. For instance, it might make sense to use more expensive and resilient nodes for certain artists.

Sustained TPS

Testing with sustained TPS increases the likelihood of catching certain types of errors, like memory leaks, caching issues, as well as transient faults. For instance, deadlocks happen when two or more processes are waiting for each other to release resources, and race conditions happen when multiple processes try to access the same resource at the same time. Both failures can happen immediately but are more likely to fail under a sustained load, especially in distributed systems. Testing with sustained TPS can also reveal cases of resource exhaustion, like reaching the maximum number of concurrent database connections by not terminating connections after use.

Reduce load testing time by 80% with Speedscale

Spike TPS

Evaluating sudden surges in traffic can lead to useful insights. In real life, these can happen due to new product launches, successful marketing campaigns, or breaking news coverage.

Invoking spike TPS during testing will help prepare for such events and create a more fully-formed understanding of your application, leading to accurate answers when management asks about response times under various load conditions, identifying when the infrastructure returns to a normal state, or how long it takes to recover from high-throughput-induced crashes.

Spike TPS may also prove useful in continuous integration. Engineering may find it inefficient to run fully-fledged hour-long tests upon each pull request and instead, opt to use spike TPS for quick validation. Then, using realistic ramp patterns and sustained TPS in preparation for new releases.


Correlating TPS with other key metrics

Exclusively monitoring TPS during testing can be useful, but really powerful insights come from correlating it with resource usage. Take a look at the graphs below from Speedscale’s dashboard:

Speedscale’s user interface showing throughput and latency, memory and CPU
Speedscale dashboard showing throughput, latency, memory, and CPU

Here, you see the throughput decreasing and latency increasing over time, while memory and CPU usage steadily increases, with the latency 99th percentile (green line) showing spikes near the end of the test. Proper insights are impossible without a full understanding of the application being tested, however, this data clearly indicates that engineers may want to investigate resource utilization.

In other words, monitoring TPS is useful for understanding how different load conditions affect the user experience, but bug fixes require you to correlate it with application and infrastructure metrics.


Configure the right tests by understanding your goals

Testing throughput and understanding your goal (and the reasoning behind those goals) can help configure the right test plan for the best results. For example, a media website may want to identify a breaking point to prepare for upcoming breaking news coverage. This would likely mean preparing for a very sudden influx of traffic with spike TPS, and a very quick ramp-up pattern. Additionally, this media website may want to test the performance of breaking news during peak operating hours, which would require them to ‘warm up’ the service with sustained load first.


Using production traffic replication for more accurate load tests

The better you understand your users, the more accurate your tests will be, and the more reliable your software. Today, more software teams are turning to production traffic replication to simulate their production environment and understand real-world usage of their software.

Load test by simulating traffic: the key to scalable Kubernetes clusters

Step-by-step guide for determining max throughput with production traffic replication

This section showcases how to use production traffic replication to accurately determine the max TPS of your application in Kubernetes, implementing the features found in Speedscale.

Prerequisites: Set up and install Speedscale (start a free 30-day trial), or explore the interactive demo.

Step 1: Create your own test config

Tests in Speedscale are always done by executing a traffic replay. All traffic replays need a test config defining the behavior of the load generation, which you can create from scratch, or clone and modify an existing one. For this tutorial, copy the “standard” configuration and name it “sustained-tps”.

Speedscale user interface showing Copy test config

Now, go to the “Load Pattern” tab and edit the existing stage by clicking the pencil icon.

Change the “Duration” to be a fixed time of 300 seconds (5 minutes) and ramp time to 30 seconds. In a real test, make sure that these options accurately reflect production behavior.

Set TPS to 100. The load generation algorithm is auto-tuning and will back off if the application cannot keep up. Because of this, you can in theory set the TPS to be anything—like 10,000—when performing stress tests, just make sure it’s higher than what you expect your application can handle. Now, click “Ok” to save the stage.

Speedscale user interface showing duration and TPS

For the sake of simplicity, this guide will refrain from modifying other options like “Goals” and “Assertions”, however, you’ll want to set those during real tests. For now, you can view them in your own installation or in the demo. Note that “Goals” and “Assertions” are different in that they are concerned with metrics and requests respectively.

Step 2: Execute a traffic replay

Assuming you’ve already created a snapshot, you can now replay it as a test from the snapshot overview in your installation.

Speedscale user interface showing Replay Snapshot as test

You can also execute a traffic replay with the speedctl CLI tool:

$ speedctl infra replay \

 –test-config-id sustained-tps \

 –snapshot-id <snapshot-id> \

 –cluster <cluster-name> <workload-name>

Whatever the approach, make sure to change the test config to “sustained-tps”. As soon as the replay has started, you can find the accompanying report in the WebUI, although the replay will need to finish before any data appears of course.

Note that the same snapshot can be replayed many times with different configs, allowing you to validate different load patterns based on the same traffic.

Step 3: Analyze the report

Speedscale user interface showing the Performance Report

Once the replay is done and the results have been analyzed, you’ll be presented with an overview of the latency, throughput, memory usage, and CPU usage, with an additional latency summary by endpoint below.

This overview alone should provide a high-level understanding of how your application performs, and how that performance correlates to resource usage. The “Success rate” tab provides insight into how well the responses from the service under test align with what’s recorded from production. Here you can dig into the actual resiliency of the application under various load conditions.

Lastly, the “Mock responses” tab shows how many outgoing requests were sent to the mock server, a very useful feature within the tool.


Increase efficiency with automatic mocks

By default, all test configs in Speedscale utilize automatic mocks to isolate the Service-Under-Test (SUT). In practical terms, a new Pod is spun up as a mock server and the sidecar proxy then redirects traffic to the new Pod. This avoids depending on third-party dependencies that may be rate-limited and ensures realistic input during testing from both the user and the dependencies, while retaining the realism of inter-Pod communication.

How load testing and service mocks work together

This automatic approach to mock servers is essential in creating realistic tests, as well as in other use cases like preview environments.


Get started with Speedscale

By now, it should be clear why max TPS is such a valuable metric to determine, and why it’s not as simple as just ramping up traffic until your application breaks. From here, you can explore the different possibilities with test configs and start validating your own applications. One possibility is to utilize multiple stages and maintain a sustained TPS for a set amount of time, then induce spike TPS with another stage to determine application behavior if traffic surges during normal operations—like sudden news coverage.

Once you’ve determined your testing methodology, you’ll likely want it implemented in a continuous manner, so you can catch errors early in your development process.

Learn more about Load Testing


Considerations to make when running a load test


The key to scalable Kubernetes clusters | load Test by simulating traffic

Ensure performance of your Kubernetes apps at scale

Auto generate load tests, environments, and data with sanitized user traffic—and reduce manual effort by 80%
Start your free 30-day trial today