The below is a repost of his blog.
The challenge with any testing is when we do not have an exact duplicate of production in a lower environment to test on. This is often because we are not using the same data, tests, volume and scenarios that production would see.
Some of the challenges are:
In my experience, lower environments are never an exact replica of production due to the following challenges:
We aren’t just testing code anymore. The systems we test are complex. They have unpredictable interactions, some times out of order events/message, and other properties that make it hard to test outside of production.
Think about it…Every time we deploy to production we are testing a unique, never seen / replicated combination of artifact, environment, infrastructure, time of day, etc..
Our applications are being tested every day in production by our customers, we just need to find a way to use all of the data customers are already generating.
With more production data it makes it easier to design load tests that accurately reflect actual server load. Testing is about reducing uncertainty. It is all about risk management and there are many categories of uncertainty that can only ever be truly tested in prod, such as behavioral testing, A/B testing, realistic load testing, etc..
I am a big believer in Blue/Green and canary deployments. With the proper monitoring you can limit the risk and quickly switch back if needed. Here are some other tools and techniques that will allow you to test more safely in production
I am impressed with what Speedscale has built. They allow you to quickly replay past traffic and simulate responses from third party APIs based on real traffic in seconds. It is a traffic replay framework for API testing in Kubernetes. I think this combined with progressive deployments will be the future of testing.
This notion of testing in production isn’t just for applications, but network changes as well. With close-loop test automation we can gain more confidence and trust. This methodology has three stages:
The tools below will help with testing these network changes
Give the ideas and tools above a try to make your changes robust and error free.
I am not saying that everyone can test in production today. It is scary typing it and many may not want to whisper these words out loud, but overtime that should be the goal of any savvy senior engineer.
In my experience, the lower environments are never replicas of production for one reason or another and is almost always a question to be answered during our post-mortem
was this tested in lower environments?
To be able to successful and safely test in production, automation must be solid and your fail-over to a previous state must be instantly available.
Over time you will gain the trust and support of your team and leadership and one day you will test in production 😊