Testing in production is one of the most effective—and risky—ways of testing.
The ability to use real-world conditions ensures reliability of tests, as no bugs can appear as a result of misconfigurations of the environment. However, using the same environment as your users also has an obvious downside: any bugs discovered by testing will immediately affect users.
Production traffic replication is one way to reap the benefits of testing in "production," as it allows you to record real user traffic. But of course, recording traffic is of no use by itself; the benefit comes from replaying it in another environment.
In this post, you’ll get a quick overview of what production traffic replication is, why a good traffic replay implementation is important, and what it might look like.
Understanding Production Traffic Replication
Production traffic replication is a technique used to record traffic in one environment—typically production—and replaying it in another. The most common use case is to include traffic replay as part of a testing pipeline, ensuring that the application is able to handle realistic scenarios.
Apps are so incredibly complex nowadays, with the large number of microservices, containers, and connections, that it’s difficult to think of all the ways a customer can use an app. By extension, it’s tough to manually script test cases for each scenario.
Traffic replay has some unique advantages over producing traffic yourself:
- You’re getting more realistic tests
- Reduces the cognitive load required to create tests
- Identifies bugs that may not be discovered in a controlled environment
- Utilizes varied data
That said, there are some considerations you have to take into account when implementing traffic replay. First of all, you won’t always be able to replay traffic 1:1. At times, the recorded traffic will inlude personal information that needs to be sanitized.
It’s also important to consider whether you need to mock any of the underlying services of the application you’re testing, as you otherwise risk putting load on unwanted parts of your infrastructure as part of testing.
The Importance of a Good Traffic Replay Implementation
As hinted at in the previous section, purely replaying traffic is unlikely to be an optimal solution. It’s important to use a traffic replay tool that can ensure:
- Reliability – Replay traffic consistently
- Security – Keep the traffic inside your own infrastructure
- Scalability – Multiply traffic when needed, e.g., for load testing
You also need to consider whether you’ll ever explore more advanced use cases. For example, using traffic replay in CI/CD pipelines excludes any tool that needs to be run from a local client. In general, you’re likely better off looking for a tool that can utilize modern infrastructure, such as Kubernetes.
As with any other tool, ensuring a good choice from the beginning is likely to reduce headaches in the future.
Characteristics of a Good Traffic Replay Implementation
So, what makes for a good choice when choosing between different traffic replay implementations? In the following sections, you’ll get an overview of different characteristics to consider when comparing solutions.
Picking a tool with all of these characteristics is very likely to set you up for success.
There are two primary reasons it’s important for a traffic replay tool to be able to transform data:
- Allowing developers to test software with different combinations of user input
- Ensuring that the application accepts the traffic
You might think that transforming data to test various user inputs defeats the purpose of using recorded traffic. I mean, it’s no longer "real" traffic, then. While technically true, it’s not the whole truth. The benefit of recorded data goes beyond whether a user is inputting "1234" or "123y" in a number field.
It’s also about all the metadata that comes with that request, e.g., user-agent, referrer, content-type, etc. Because of this, it does make sense for developers to modify some of the recorded traffic to test different cases.
Transforms are also needed at times to ensure that the application will even accept the recorded traffic. It’s very common to include either session-specific data, like a timestamp, or authentication headers.
Timestamps may not always need to be changed, but it’s important to make sure that auth headers are modified in a way that the application accepts it; otherwise, it’s impossible to test anything locked behind an authentication gateway.
Even though you may have recorded a lot of traffic, you won’t always want every single request to be replayed. Most commonly, you might want to filter out specific traffic types, like monitoring heartbeats.
Or maybe you’ve modified a feature in your application slightly, and now you only want to verify that single part is still working as expected. In this case, it’s useful to only use a subset of the recorded traffic.
In general, while production traffic replication is about creating realistic tests, you’re likely to run into cases where you want to fire off a quick test. Being able to filter data is necessary to make this work.
As has been mentioned a few times already, it’s important to consider implementing mocks when you’re replaying traffic. However, mocks aren’t always easy to create, as you need to know what requests your application will make, as well as the responses it expects.
Rather than creating mocks manually, a good traffic replay tool should be able to utilize the recorded traffic to automatically create a mock for your services.
Easily Manageable Test Configurations
The most important aspect of a traffic replay tool may be how it produces the traffic, but a close second is how it manages test configurations. Some ways of creating easily manageable test configurations include but are not limited to:
- Exporting and importing test configurations
- Creating and saving templates for common use cases
- Making it easy to modify existing configurations
- Making configurations easily shareable
Ubiquitous Traffic Captures
It’s not unreasonable to think that traffic capture is going to be commoditized in the near future. In this case, it’s important to use a tool that isn’t limited to using traffic generated by the same provider. For example, Speedscale can use Postman Collections to generate traffic.
Ubiquitous traffic capture also seems likely with the increasing prevalence and accessibility of tools that enable traffic capture. Technologies like service meshes have grown steadily in popularity over the past several years, and given the nature of how service meshes work, enabling traffic capture isn’t an unreasonable task.
Additionally, the advancement of cloud computing and other technologies may also contribute to the commoditization of traffic capture. As the concept grows in popularity, there’s a good chance major cloud providers will develop services to enable traffic capture in your cloud directly.
This then has a chance of leading to higher demand, lower cost, and, subsequently, increased availability. However, although the means to capture traffic may become easier and more prevalent, the capture is only one part of the equation.
The ability to parse, de-anonymize and parameterize traffic for use in load generators and mocks is still a difficult problem to solve, which has led Speedscale to have a major focus on this area.
This focus has produced a tool that can maximize the uses of captured traffic, in the shortest amount of time.
Start Using Production Traffic Replication Today
Traffic replay is the key to unlocking the benefits of production traffic replication. As you’ve seen in this post, it’s not enough to only have quality recorded traffic—you need to use a quality traffic replay tool as well.
With the approach of traffic replication, it’s finally possible to reap the benefits of testing in production without the inherent risk. And, should you happen to be running your applications in Kubernetes, you might be happy to know there’s a specialized tool for that.