Optimize Kubernetes Performance Part 3: Streamlining Development

In part 1 of this series you got a general overview of what traffic replay is, and why it’s a useful technology to consider for Kubernetes in general. Then, you get to see examples of how configurations within Kubernetes clusters themselves can be optimized.

In part 2, you got to see how optimizations don’t come from just modifying your cluster, but also from comparing different tools and technologies, making sure you’re using the best one for you.

In this third and final part, you’ll get to see a few examples that aren’t strictly related to configuration changes or new tools, but rather ways that traffic replay can help streamline the development process for those deploying in Kubernetes.

Streamlining with Traffic Replay

It may not seem like it at first glance, but this is an important factor to consider when trying to optimize anything within software engineering. A slow development process will slow down all other parts. Moreover, a fast development process will allow you to experiment with different optimizations faster.

Verify Application Optimizations

As mentioned in the introduction—and throughout this series—some optimizations won’t come from just Kubernetes itself. It’s highly likely that any team looking to optimize a Kubernetes cluster will also look at optimizing the codebase of the services running within the said cluster.

One way to test these optimizations is to get them deployed to production, then collect and analyze metrics. While this approach will provide you with clear indications on whether or not the optimizations are working, the process can be long and tedious.

Ultimately, this may lead to a scenario of not discovering all possible optimizations because of time constraints.

On the other hand, simply spinning up the new code in a development environment may not provide you with a realistic insight into the performance of the new optimizations. You need to see how the application reacts to real user traffic, which is of course where traffic replay comes in.

Because traffic replay can be used in any Kubernetes cluster, developers working on application optimizations can spin up a local cluster (for example via minikube) and verify whether their new code changes are working, even before opening a pull request! Even if you consider your traffic load too big to run locally, you can still combine it with a tool like Skaffold in order to have a development cluster running in the cloud.

Not only will this greatly increase the efficiency of the development team but it’ll also increase the trust in the codebase before it’s deployed—and in most cases before it’s even reviewed.

Identify (and Resolve) Bottlenecks

The best way to determine bottlenecks in a system is to remove them, going through your services one by one until you find the culprit slowing your infrastructure down. This is of course easier said than done, but not impossible. Imagine you have a request that needs to go through three different APIs, and the last API in the chain interacts with a database.

You notice a latency spike in a request path, and you’re determined to figure out whether this is caused by one of the APIs or by the database. One way to do this is to spin up the three APIs and a database replica in a development cluster. From here, you can start sending requests through the APIs, modifying one service at a time in order to isolate the service causing the latency spike.

It’s highly likely that you’ll find the cause of the latency at some point. But the question is, how long will it take? Not only will you have to spin up the new services in your development cluster but you’ll also have to seed the database, ensure communication between the services, collect metrics, generate load, etc.

Simply put, it’s not an efficient process. At this point in the series, you’ve probably already got an idea of how traffic replay can be the solution to this; Speedscale can generate the load, collect the metrics, and analyze them for you. But, there’s one feature specific to Speedscale that hasn’t been mentioned yet: automatic mocks.

The sidecar proxy that Speedscale is implementing in your Pods captures all traffic (by default, this can be changed through configurations), both inbound and outbound. Because the outbound traffic is being captured, replaying traffic with Speedscale can automatically mock the outgoing requests.

This will heavily reduce the time it takes you to isolate the service that’s causing the latency spike. Because replays can be initiated by deploying your manifest files with simple annotation modifications—as described in part 2—this means you can spin up each service individually in your development cluster, investigating the service without having to worry about the other dependencies.

Granted, you may very well have an APM system set up that can inform you of this, removing the need to spin up versions of your application in development.

However, that won’t help you resolve the bottleneck.

Once the root of the bottleneck has been identified, you can start modifying either Kubernetes configurations or parts of the codebase. Tying into the previous section, you can then use traffic replay and the automatic mocks to test these changes either locally or in a development cluster.

Lastly, this isn’t only about optimizing the speed at which bottlenecks can be identified and resolved. Because traffic replays can be initiated without having to worry about Kubernetes configurations, seeding data, or any other type of infrastructure task, any developer can perform this entire process locally and without intervention. Whereas an infrastructure engineer will likely need to be involved in a traditional scenario of spinning up development instances, traffic replay provides freedom to engineers.

Experiment with Architectures

It’s not uncommon for engineering teams to consider new architectures from time to time. Whether it’s a new way of spreading load across nodes, implementing a service mesh, using service discovery, etc., there are plenty of reasons why an organization may choose to implement a new architecture.

For most teams this is going to be a big undertaking, as changing the underlying architecture of anything is never a simple process. And of course, traffic replay is not going to instantly make this process simple. There are still going to be many considerations a team will have to make, and something like implementing a service mesh will still take time. But, Google has clearly shown the power of tools like Speedscale, as it was the tool-of-choice for the case study on Tau VMs.

The way that traffic replay helps optimize the performance in this case is in how it reduces the amount of surprises. In life, we say that there are two certainties: death and taxes. In software, the one certainty is bugs. No matter how much time you spend coding something, there will always be bugs.

By using realistic data, i.e., recorded user traffic, you can be certain that the development version of your infrastructure is going to behave in the same way as when it’s deployed to production—granted, of course, all other things are equal, like your database server.

You’ve seen quite a few examples so far of why this is useful. But when you’re working with intricate and complex systems like service meshes or load balancing, having realistic data is crucial, as these systems are meant to handle large amounts of varying traffic.

Final Thoughts

That concludes the third and final part of this series, where you’ve seen quite a few examples of how to optimize the performance of your Kubernetes clusters. Hopefully you’ve either found these examples to be useful and plan on implementing them yourself, or perhaps you’ve been inspired and have come up with new ways to reap the benefits of this exciting technology.

The intention of this series has been to provide a high-level overview of how traffic replay can be useful, with only a few code examples. If you’re curious to explore a deeper dive into how traffic replay is implemented in an actual cluster, feel free to check out the tutorial on how to load test in Kubernetes.

Optimize Kubernetes Performance Part 3: Streamlining Development

Overview

Streamlining with Traffic Replay

Verify Application Optimizations

Identify (and Resolve) Bottlenecks

Experiment with Architectures

Final Thoughts

Blog

Blog

Blog

© 2025 Speedscale
All Rights Reserved | Privacy Policy

Optimize Kubernetes Performance Part 3: Streamlining Development

Overview

Streamlining with Traffic Replay

Verify Application Optimizations

Identify (and Resolve) Bottlenecks

Experiment with Architectures

Final Thoughts

Blog

Blog

Blog

© 2025 SpeedscaleAll Rights Reserved | Privacy Policy

© 2025 Speedscale
All Rights Reserved | Privacy Policy