Optimizing eCommerce sites is a huge endeavor. With these 5 steps, you’ll be able to rapidly identify and resolve your scalability issues:
- Establish a performance baseline
- Prioritize pages with highest business impact
- Create a load test plan with traffic replication (a faster alternative to traffic mirroring)
- Build a fast feedback loop
- Analyze reports
5 Steps to Optimizing eCommerce Site Performance with Kubernetes Load Testing
If your site deals with a lot of traffic during the holidays, you need confidence that it can handle high volumes long before Black Friday and end-of-year sales roll around. But improving overall application performance is a challenging and time-consuming undertaking. In this article, we outline 5 key steps to optimize performance of your retail site when time is of the essence.
The 4 Golden Signals
To understand how your application is performing, site reliability engineers (SRE) measure four key metrics, known as the Golden Signals. Whenever possible, these metrics should be measured from the customer’s perspective. This was such a large focus area for us at Speedscale that we designed our product to highlight the Golden Signals directly in our reports:
Step 1: Establish a performance baseline
Before you start designing and running load tests, performance tests, and labs, you need to understand where you are today. By using an observability suite, you can leverage data from your own system to understand the current state of Kubernetes performance and how much you improve. In addition to the usual suspects in the observability ecosystem, learn more about tools to monitor your cluster in our blog, Kubernetes Monitoring: Top 5 Tools & Methodologies.
Example Dashboard from play.grafana.com
If you can, take into account what the Golden Signals looked like during peak load scenarios, like during a holiday sales rush or big promotion. Having this data can help indicate performance bottlenecks in one of these 3 areas:
- At the client level: Low use of a CDN can indicate a long website load time. These issues are frequently shown in a browser or mobile app with large waterfall charts.
- At the application level: A report of latency by endpoint can help you understand which URLs in your backend are performing poorly
- At the data level – this includes everything from third-party API calls to databases or other downstream systems that are involved in the business transaction, in particular how much time is spent in the Application vs. Data tier
Step 2: Prioritize pages with high business impact
While overall site performance is important, there are certain pages that have higher business impact (i.e. drives more revenue) than others. While it may be tempting to optimize the slowest pages first, a more strategic approach would be to prioritize the pages that most directly impact revenue. A typical eCommerce website will see the highest amount of traffic on the homepage and promotion pages, lower traffic on product listing and detail pages, and the lowest amount of traffic on a shopping cart or checkout page. Picture this as a funnel:
Despite having the lowest amount of traffic, the value of shopping cart and checkout pages is much higher. Our recommendation is to prioritize this funnel in reverse order, starting with the highest value pages. The thinking behind this is that you want to ensure that kubernetes performance is optimized and all problems are addressed for the customers that are actually buying products from your store.
Step 3: Create a load test automation plan with traffic replication
Starting with the highest business impact pages, focus on the subset of endpoints related to the business transaction and the downstream dependencies. If you are not sure which endpoints are called and what components are downstream, now is a good time to dig into traffic replication. This approach leverages traffic data from your site to understand how it’s actually used by customers in order to pave the way for continuous performance testing.
Gathering this data will create a snapshot which includes:
- The endpoints that are part of a business transaction (e.g. cart, checkout, Thank You page, etc.)
- The downstream systems that are part of the transaction (e.g. payment processing, shipping, user validation, etc.)
- The data that’s required in order to run accurate tests (e.g. SKUs, prices, customer records, etc.)
Example Traffic Viewer from Speedscale
Traffic replication leverages traffic data from your site to understand how it’s actually used by customers in order to inform your load test plan.
If you’re using a traffic replication platform like Speedscale, simply save this data into a snapshot, and you’re ready to start running tests. If you’re not using traffic replay yet, make sure to give yourself plenty of time to write tons of scripts using your favorite Kubernetes Load Testing tool. If you depend on a large number of downstream systems, definitely allocate some time to write Service Mocks so you can isolate your app from systems that are outside of your control.
Step 4: Build a fast feedback loop
Once you have the traffic to replicate, you need to create the fastest feedback loop possible. This is a typical engineering approach that requires four steps done in rapid succession:
- Run a test
- Measure the result
- Decide what you want to change
- Run the test again
If this entire process takes your organization days or weeks, you will be severely limited in how many performance improvements you can make.
Source: Wikipedia [Feedback Article](https://en.wikipedia.org/wiki/Feedback)
By using continuous performance testing, you can cut the feedback loop down to minutes. A tool like Speedscale accelerates the testing process with our unique approach:
- Performance test automation: Speedscale generates tests automatically with traffic so you no longer have to write test scripts.
- Environment: Speedscale eliminates the need to build elaborate end-to-end environments by reducing the scope to a subset that you can easily control, and use service mocks for the rest.
- Data: Quit building out new data each time. Keep the test data and environment data automatically synced so you can refresh the environment in seconds.
Step 5: Analyze reports
After each test is run, quickly reproduce the Golden Signal data, and identify the lowest performing component. If you have a fast feedback loop, then you only need to make a single change and then run the test again. This allows you to quickly whittle down the numerous performance problems one at a time. If you make multiple changes before you run again, you won’t know which change caused the performance improvement (or degradation if you’re unlucky). Again, having a quick way to run these performance tests is critical.
Optimizing an entire ecommerce site is a huge endeavor. By starting with a clear plan, you’ll be able to rapidly identify and resolve your scalability issues.
- Golden Signals and Kubernetes Performance Baseline – these are the most important metrics to prioritize as part of your optimization project. You want to know the metrics values from last year, in addition to a recent high-scale event like a promotion.
- Prioritize by Business Impact – it may feel good to work on the slowest page or the homepage which gets called a lot, but start with the pages most likely to make you money and move up the funnel from there.
- Create a Plan with Traffic Replication – before jumping in writing and running tests, outline what needs to be called and what kind of data is required for accuracy. This sets the stage for continuous performance testing.
- Feedback Loop – having a quick way to run tests, make a change and run tests again is important— since you’ll run tests frequently as part of this project, it’s worth investing in this kind of system.
If you ever want to talk shop or share notes on how your project is going, join us in the Speedscale Slack Community and say hello!