Production Traffic Replication for Software Testing

Testing in production is one of the most effective, yet risky, ways of testing. Using real-world conditions ensures that your tests are reliable since no bugs can pop up as a result of misconfigurations of the environment. However, using the same environment as your users has an obvious downside: any bugs discovered by testing in production will also immediately impact your users.

Production traffic replication is one way to reap the benefits of testing in “production,” as it allows you to record real user traffic. But of course, recording traffic is of no use by itself; the benefit comes from replaying that traffic in another environment.

This article explains everything you need to know about production traffic replication and replay for software testing. We’ll cover:

What is production traffic replication and replay?
The main benefits of using traffic replay for testing software
The core use cases for this testing method
The various implementation and sophistication levels
The “must haves” when it comes to evaluating different solutions

What is production traffic replication & replay?

Production traffic replication and replay is a technique used to capture network traffic in one environment—typically a production environment—and then replay that traffic in another environment. Network requests are, by design, ephemeral. Once they leave the wire, the listening application must handle them appropriately or else they are lost. For a running application network, requests have specific meaning and are handled according to their protocol and content. Traffic capture and replay means storing all requests, regardless of their protocol or content, with the intent of playing them back at a later time or in a different environment.

In order to visualize this, let’s think about production traffic capture and replay as a DVR (the device that recorded your favorite TV shows before streaming services existed).

Captured traffic is copied in a format that can be replayed later, or even modified for use in different environments. To be clear, traffic replay is different from session replay.

Benefits of production traffic replication & replay

With the large number of microservices, containers, and connections, software applications today are so incredibly complex that it’s difficult to think of all the ways that customers use them. By extension, it’s difficult to manually script test cases for every possible scenario.

By comparison, traffic replay has some major advantages over manually scripting and generating traffic yourself:

Generates more realistic tests
Reduces the cognitive load required to create tests
Identifies bugs that may not be discovered in a controlled environment
Leverages varied data

That said, before implementing traffic replay, there are some considerations you should take into account. First, you won’t always be able to replay traffic 1:1. Sometimes the recorded traffic will include sensitive data or personal information that needs to be sanitized.

Second, you’ll want to consider whether you need to mock any of the underlying services of the application you’re testing, as you’ll otherwise risk putting load on unwanted parts of your infrastructure during testing.

Use cases for production traffic replication & replay

The most common use case is to include traffic replay as part of a testing pipeline. Using real traffic ensures that your application is able to handle realistic production scenarios.

When you look around, you may notice that recorded traffic is actually used in many places. Wireshark has been the packet capture standard for decades and Tcpreplay is just one of many tools which can replay pcap files. Netflix created Polly.JS and Facebook records a fraction of production traffic to create better testing environments. And Uber created a capture and replay load testing framework for internal use.

Contract testing

Recording traffic for one version of a service and replaying it for another version can help validate the new version. If the same request on the new version does not receive the same response, the service has introduced a bug. This is particularly useful for API integration testing when implemented in CI/CD.

Load testing

Manually scripted load tests have the same issues as manually written unit and integration tests. While useful, they are based on the assumptions of the developer. Load testing with multiplied real-life application traffic creates a more realistic view of production scenarios.

GoReplay vs. Speedscale:
how they compare for Kubernetes load testing & traffic shadowing

Service mocks

Service virtualization, or service mocks, help validate application behavior while keeping all other attributes constant. Typically, mocking external dependencies is resource-intensive and expensive to run. Also, some resources are difficult to use for development (like a payment API or a product store API) because requests may change the internal state of a critical system. However, using traffic that is captured and replayed can be a much more cost-effective way to create a mock service (API, database, queue, etc.) for testing or development.

Traffic replay levels of sophistication

Implementations of traffic replay vary based on goals and execution, and there are several levels based on capabilities. Capture is as important as replay because it determines what is possible during the replay phase.

Level 1: Exact reproduction

Level 1 capture generally operates at network layer 4. Bytes are captured and the same bytes are replayed. Replayed traffic matches captured traffic exactly. This level is useful for recreating traffic exactly. There is no need to understand protocol or composition because replayed traffic is one-to-one. Introspection is limited to network packets and raw data. This looks exactly like the original traffic DVR visual.

Level 1 is the 80/20 of traffic replay. Most of the value is achieved with very little effort.

Level 2: Metadata and context

Level 2 and above operates at network layer 7. Captured traffic contains metadata about the protocol and contextual request information which is often necessary to fully inspect the traffic before replay. This level may provide introspection into HTTP request URL, headers, and body. Captured traffic between a client and SQL database may provide a way to view queries outside of replay. The traffic DVR now knows about the captured content.

Basic replay is much the same as level 1, but level 2 capture is necessary for replay at higher levels. It allows deep inspection of application traffic and behavior that is difficult to get with other methods.

Level 3: Manual rewriting

Level 3 provides manual request rewriting. Request details may be manually modified. HTTP URLs can be overwritten. Headers can be changed, added, or removed. The traffic DVR supports modification during replay with the help of a provided script or configuration.

Level 3 enables traffic replays which are very close to, if not as good as, production scenarios. The primary drawback of level 3 is the manual effort required to create a valid replay for a running application outside of production.

Level 4: Automatic rewriting

Level 4 is differentiated by an intelligence layer on top of the traffic to automatically detect and rewrite traffic, or create automated suggestions, without manual intervention. Relationships between IDs and tokens are detected automatically. A matching value from a JWT and query parameter can be updated and the JWT resigned. Timestamps which were 5 minutes old during capture can be recognized and updated to be 5 minutes old during replay. The traffic DVR supports automatic, intelligent modification.

Level 4 is the most difficult to achieve but provides the most value to applications during replay. Computers are unforgiving, and reliably recreating production scenarios quickly falls apart when the minor details an application expects are incorrect.

5 “must have” features when choosing a traffic replay tool

When evaluating traffic replay solutions, it’s important to consider:

Reliability – Replay traffic consistently
Security – Keep the traffic inside your own infrastructure
Scalability – Multiply traffic when needed, e.g., for load testing

You also need to consider whether you’ll ever explore more advanced use cases. For example, using traffic replay in CI/CD pipelines excludes any tool that needs to be run from a local client. In general, you’re likely better off looking for a tool that can utilize modern infrastructure, such as Kubernetes.

As with any other tool, ensuring a good choice from the beginning is likely to reduce headaches in the future.

An overview of traffic mirroring options in Kubernetes

So, what makes for a good choice when choosing between different traffic replay implementations? In the following sections, you’ll get an overview of different characteristics to consider when comparing solutions.

Picking a tool with all of these characteristics is very likely to set you up for success.

Transforming data

There are two primary reasons it’s important for a traffic replay tool to be able to transform data:

Allowing developers to test software with different combinations of user input
Ensuring that the application accepts the traffic

You might think that transforming data to test various user inputs defeats the purpose of using recorded traffic. I mean, it’s no longer “real” traffic, then. While technically true, it’s not the whole truth. The benefit of recorded data goes beyond whether a user is inputting “1234” or “123y” in a number field.

It’s also about all the metadata that comes with that request (e.g. user-agent, referrer, content-type, etc.) Because of this, it does make sense for developers to modify some of the recorded traffic to test different cases.

Transforms are also needed at times to ensure that the application will even accept the recorded traffic. It’s very common to include either session-specific data, like a timestamp, or authentication headers.

Timestamps may not always need to be changed, but it’s important to make sure that auth headers are modified in a way that the application accepts it; otherwise, it’s impossible to test anything locked behind an authentication gateway.

Filtering data

Even though you may have recorded a lot of traffic, you won’t always want every single request to be replayed. Most commonly, you might want to filter out specific traffic types, like monitoring heartbeats.

Or maybe you’ve modified a feature in your application slightly, and now you only want to verify that single part is still working as expected. In this case, it’s useful to only use a subset of the recorded traffic.

In general, while production traffic replication is about creating realistic tests, you’re likely to run into cases where you want to fire off a quick test. Being able to filter data is necessary to make this work.

Automatic mocks

As has been mentioned a few times already, it’s important to consider implementing mocks when you’re replaying traffic. However, mocks aren’t always easy to create, as you need to know what requests your application will make, as well as the responses it expects.

Rather than creating mocks manually, a good traffic replay tool should be able to utilize the recorded traffic to automatically create a mock for your services.

Easily manageable test configurations

The most important aspect of a traffic replay tool may be how it produces the traffic, but a close second is how it manages test configurations. Some ways of creating easily manageable test configurations include but are not limited to:

Exporting and importing test configurations
Creating and saving templates for common use cases
Making it easy to modify existing configurations
Making configurations easily shareable

Ubiquitous traffic captures

It’s not unreasonable to think that traffic capture is going to be commoditized in the near future. In this case, it’s important to use a tool that isn’t limited to using traffic generated by the same provider. For example, Speedscale can use Postman Collections to generate traffic.

Ubiquitous traffic capture also seems likely with the increasing prevalence and accessibility of tools that enable traffic capture. Technologies like service meshes have grown steadily in popularity over the past several years, and given the nature of how service meshes work, enabling traffic capture isn’t an unreasonable task.

Additionally, the advancement of cloud computing and other technologies may also contribute to the commoditization of traffic capture. As the concept grows in popularity, there’s a good chance major cloud providers will develop services to enable traffic capture in your cloud directly.

This then has a chance of leading to higher demand, lower cost, and, subsequently, increased availability. However, although the means to capture traffic may become easier and more prevalent, the capture is only one part of the equation.

The ability to parse, de-anonymize and parameterize traffic for use in load generators and mocks is still a difficult problem to solve, which has led Speedscale to have a major focus on this area.

This focus has produced a tool that can maximize the uses of captured traffic, in the shortest amount of time.

Get started with production traffic replication today

With the approach of traffic replication and replay, it’s finally possible to reap the benefits of testing in production without the inherent risk. If you’re running your applications in Kubernetes, give Speedscale a try.

The Definitive Guide to Production Traffic Replication and Replay for Software Testing

Overview

What is production traffic replication & replay?

Benefits of production traffic replication & replay