Optimize Kubernetes Performance Part 1: Cluster Configurations

Kubernetes is a powerful platform that comes with many features to help engineers run their applications more efficiently. However, as you gain more experience and deploy more workloads, you’ll inevitably start looking for ways to optimize Kubernetes performance.

There are many ways to approach optimization. On one hand, you could work exclusively with the tools and configurations provided by Kubernetes itself; on the other, you could reap the benefits of third-party tools. Both work, and you may choose a combination of both.

In this series, you’ll get an overview of how traffic replay can help you discover and implement various kinds of optimizations. By the end I hope you’ll be inspired to come up with optimizations on your own.

This first part is going to focus on how traffic replay can help you discover and validate cluster configurations, like modifiying resource limits.

Implementing Traffic Replay
Optimizing Kubernetes Cluster Configurations
How Will You Optimize Your Cluster?

Implementing Traffic Replay

If you’re not familiar with traffic replay, the concept is to record traffic from your production environment, then replay it in another.

While there are many tools to choose from to implement this functionality, one of the only ones that focuses solely on Kubernetes is Speedscale.

Implementation strategies differ from tool to tool; Speedscale’s approach is to install an Operator in your cluster. With the operator installed, capturing traffic is as simple as adding annotations to your Services.

Once you’re ready to replay the traffic, that can be done via the speedctl CLI tool, via the WebUI, or by adding annotations to the Services in your development/staging cluster.

This is all the knowledge you’ll need to understand the examples provided in this series, but if you’d like to learn more, you’re welcome to read the in-depth post on traffic replay.

Optimizing Kubernetes Cluster Configurations

Graphic showing the three optimizations listed below

Now that you know how traffic replay can be implemented at a high level, it’s time to see how useful it can be when performing standard optimization tasks.

Note that these are only hand-picked examples, and not intended to be an exhaustive list of ways to optimize your cluster.

This section aims to showcase the principles of traffic replay, which you can then use to create your own implementations.

Testing Resource Configurations

Setting proper resource requests and limits is the first choice for many engineers who want to optimize a cluster.

Requesting the right amount of resources can help Kubernetes schedule your Pods in the most optimal way. This not only ensures that your application will have enough resources when peaks occur in traffic, but also reduces wasted resources on individual nodes.

Similarly, resource limits help to contain the impacts of sudden spikes in resource usage caused by issues such as memory leaks.

So how does traffic replay help in setting these values correctly? Let’s start with resource requests.

Resource Requests

Figuring out the amount of resources required by an application is rarely as simple as booting up an instance of it. The magnitude of the impact changes based on the nature of the service—for example, something like the OpenAI API will require many more resources than a simple CRUD API.

Generating load to your application is crucial to getting the baseline amount of resources your application will need. It may be argued that this baseline can be found by looking at metrics in production, but how will you prepare for a 2x, 3x, 4x, or higher growth in traffic?

It’s true that looking at current metrics may be enough to determine the correct resource request now, but in certain situations—like if you’re rolling out new changes, or a new app—it’s important to know what sort of resource usage you can expect in the future.

It’s not as simple as seeing that a service uses 50MB of RAM with 100 users, then assuming 200 users will use 100MB of RAM.

A simple reason for this is that any application will use some amount of RAM just to run, without receiving any requests; there are many more ways that RAM usage won’t follow the number of users linearly.

Traffic replay isn’t only capable of replaying traffic 1

, but by replaying multiple copies of captured traffic, giving you a much clearer idea of what to expect in the future.

This is just one way of using traffic replay to determine request limits, focusing on optimizing for the current deployment in production.

Traffic replay can play a big role during development, as well.

If you’re looking into optimizing Kubernetes performance, it’s likely that you’re thinking about making optimizations in the applications themselves, which may decrease the resource usage. Rather than deploying the new code—which can be a long and tedious process in some organizations—you can now spin up a preview environment with the new code, and quickly either verify the existing resource request, or determine that it needs to be changed.

But of course, resource requests are only one part of the equation.

Resource Limits

Setting the right resource requests is often about optimization, and less about avoiding failures. Setting resource limits tends to be the exact opposite.

An uncontrolled memory leak can end up having catastrophic results, killing not only the application with the bug, but other Pods in your cluster as well. This is a great example of why resource limits should always be set, as it can significantly decrease the impact of resource usage spikes.

Setting the right limit, however, can be very tricky. You might be inclined to just look at the metrics for the past week or month, then derive the limit from that. You might add an extra 10% just to be safe.

While that approach will work by limiting the impact of failures in a given service, it’s going to introduce certain risks. Most importantly, you won’t know how the service is going to react when the limit is being hit. It could:

Shut down
Introduce latency
Drop requests
Respond with errors
Lose data

Any of those options—along with many other—are possible, and you won’t know because you haven’t tested it. Some scenarios can be tested without traffic replay, like whether the service will shut down. But others will be impossible to test.

Without generating a realistic load, you can’t verify for sure that all request paths will work as expected.

Being able to test what happens when resource limits are reached can greatly influence your decision about which optimal value to choose. On top of that, you can go a step further and test what happens when limits are hit in a certain way.

Sometimes a service will gradually build up resource usage, reaching 99% and not getting to 100% until 5 minutes later. Other times, a service will spike from 60% to 100% within seconds. In some types of services, this way of hitting the limit can have different consequences.

All in all, using traffic for resource configurations is just as much about finding the optimal values here and now as it is about preparing for the future and streamlining the development process.

Testing Scaling Rules

Implementing scaling rules in Kubernetes is fairly easy, but doing so in the most optimal way can be tricky.

Almost no matter what approach you’re using to test your scaling rules, the principles stay the same: generate load to the service. So the way to determine whether one approach is better than another, is to figure out how the load is being generated.

There are many options on the market, from using simple command-line tools like curl to generate requests to implementing complete tools like JMeter.

However, most options like these require you to manually configure the load that needs to be created. While this will increase the resources needed for your application, thereby triggering scaling rules, it can be argued that it’s a sub-optimal approach.

The main disadvantage of manually created traffic is that it’s a time-consuming process, and will never reflect the real world accurately. You’ll miss either specific metadata, different variations of request, or entire request paths.

Traffic replay doesn’t just increase the quality of your load generation, however. A good traffic replay tool will be able to capture a variety of metrics during the test, allowing you to validate factors like:

Latency goals
Dropped requests
How long it takes to spin up new instances

These factors are important to track, as well. Adding more instances to your cluster is no good, for example, if you find that many requests are dropped during the scaling process.

In any case, if you’re using traffic not captured in your production environment, it can give you a good indication, but it can never provide you with realistic insights.

Another benefit of traffic replay using captured traffic is that you can easily test different traffic patterns, e.g., to test how scaling rules act on the traffic you usually receive at night, compared to traffic you receive during the day.

Test Different Network Configurations

This example is arguably only relevant for a minority of Kubernetes engineers, but it’s still an interesting use case.

For most organizations, implementing network policies has more to do with security and separation than with performance. Studies have shown that adding up to even 200 network policies has very little performance impact.

However, that isn’t to say that no organization needs to care about it.

Imagine you’re a company offering a Consent Management Platform (CMP), with your main offering being a cookie banner. You’ll need every customer to implement a small script less than 50kB in size, taking less than 50ms to load.

In this case, saving even 5ms is a big deal.

So, being able to efficiently test network policies and their performance is a need for some companies, and as such you need a way to perform tests reliably.

Given that this is happening at the network layer, and how we’re talking about milliseconds, it’s important that the requests you’re sending through your network controller matches real life.

Even a few missing headers can have an impact on the processing time of network policies. While you can create requests manually, there are so many header variations created from using different browsers, operating systems, and perhaps even browser extensions, that manually creating a test suite that reflects real life could be very time-consuming, if not impossible.

This is why traffic replay is exceptionally well suited for this example, as you are getting a 1

replica of the requests you’d see in a production environment.

How Will You Optimize Your Cluster?

Many engineers view traffic replay as only good for load testing, but as you’ve seen in this post, that is far from the truth.

Whether you need to verify some simple configuration changes or detect small performance optimization possibilities, traffic replay is likely to be helpful.

Now the question is: how will you optimize your cluster? Are you going to try out traffic replay, or are you still on the fence about it?

If it’s the latter, look out for part two, where you’ll see how traffic replay can help create powerful comparisons. Or, look at how traffic replay helped scale a SaaS demo.