Speedscale ‘SpeedChat’ Episode 2: Shift-Left vs. Shift-Right Throwdown featuring Nate Lee (Founder, Speedscale), Ken Ahrens (Founder, Speedscale) and Jason English (Principal Analyst, Intellyx).
Listen or download the podcast on your favorite player here:
Watch the Youtube version here:
Jason: All right. You’re listening to speed chat. And we’re talking today about shift left versus shift- right. So we’ve seen a lot of movement In re-inventing the software development life cycle and how it’s changed due to agile development, continuous delivery delivery into clouds.
And now we have this concept of Shift-Right — that’s entering the picture. So I guess we can kind of do this as a point counterpoint. So I have with me, Nate Lee and Ken Ahrens from Speedscale, and we’re going to have this little talk about how to contrast the two styles and approaches and what it means.
So let’s start with with shift- left. So, you know, we’ve been with shift- left forever and, and you know, why has it become so popular and is it still relevant today?
Ken: Nate I think you should go first. Cause I think you’re the expert in shift- left.
Nate: Well, yeah, it’s no secret. We came from the world of service virtualization and that was primarily a shift- left play, right? From the ITKO days. You know, and I think it’s firmly rooted in the conventional waterfall software delivery — Spinning up environments was still, done through VMs and bare metal.
It was always reserved for the end of the SDLC. That’s where you did all of your testing. Right? So anytime you had an opportunity to take that end of the SDLC process and kind of sprinkle it earlier in the life cycle of that, that’s what shift left meant to us. And yeah. You know, the principle that we would always espouse is, every single time a defect gets from one part of the lifecycle to the next —
The defect escape ratio or the leakage, right? At least that’s what they taught us in Georgia tech. And I think it was real. We definitely saw the effects of that at, at the fortune 500s, every single time it goes to the next life cycle, the cost of it increases 10X. So, you know, in theory, if a developer catches it, you know, it’s $10.
If it gets into unit testing, maybe it’s a hundred dollars, you know, so on and so forth. That’s what shift left means to me. And yeah, I think things kind of got turned on its head when people started adopting Agile and there wasn’t a clearly defined test phase anymore.
Ken: Yeah. I want to kind of add on to what you said there, Nate.
I think shift left is great. If you’re finding defects way too late in the cycle what about finding things earlier? And and a lot of what we actually discovered was when you were in a waterfall process, when you got to that testing phase, if things weren’t really buttoned down, you would have a lot of testers and they’d be sitting around twiddling their thumbs, waiting for the environment to be ready.
And so anything you could do to figure out those kinds of problems early were great value because you wouldn’t be paying for people sitting around. So shift left is great. You know, as its own concept. I just think a lot of companies aren’t developing software that way anymore.
Jason: So what do you think is leading to this? I mean, do you think there’s some kind of heresy around talking about shifting right now? Are we somehow discouraging people from shifting lift by talking about shift-right? Or how did this come about that, that we would consider this an acceptable alternative to shift left practices?
Ken: Just like I said, I liked shift left. I liked shift right too. So it sort of depends on your organization who you are. And I think what’s happened is as the big cloud giants have come out and started to open up on how they do their software development practices, people were surprised to find they didn’t have a long defined testing phase.
They tried to reduce as much as possible, the time between the code being written by a developer and putting it in production in front of customers. And when you remove that phase, the idea of left doesn’t make sense. You can’t go any further left.
So they started saying let’s shift- right. We still needed to do this quality phase, but we’ll do it in the production environment. So I think this is, it’s just an alternative when you’re trying to move really fast. You still want to know if the code is any good. Let’s at least try to discover it in production.
And I think that’s becoming more popular nowadays because people want to move really fast and they value agility over quality.
Nate: It’s kind of in line with what you’re saying, Ken, but I also think there’s some of –I like to examine kind of the human nature side of things too.
There was once a customer, I visited as a big retailer in North Carolina and they said ” we just do all of our testing in production.” And, it was at the time where all we did was shift left testing. Right.
So it was shocking to hear that all they did was test in production and they did point out a couple of key things like, “Hey, we have to stop right before we do the payment checkouts, because we can’t actually invoke any payments, cause that would wreck our you know, records and financial P&Ls and all that stuff.”
And so there were some pretty big gaps in what they were doing, but eventually. You know, looking back at them, I’m like where they thought leaders or were they doing what they had to do? And then the industry caught up and named it something.
Right. And I think there’s a little bit of both really that’s going on. And, how do we get there? I think is important to examine, and I think the way we got there is –I heard one of the fintechs here in Georgia say, it’s like we adopted agile and we forgot to bring quality along with us.
Right. Agile had no clearly defined test phase. The developers now wear multiple hats. There’s a, you build it, you own it mentality. Right. And so developers need to do all of these things. And when you’re crunched on time and you’ve got a two week sprint, there’s not any real methodologies or tools that allow you to do all of that in the sprint.
So I think in some ways we just defaulted to, well, we’re going to push it out the door, but as a fail safe, we will apply tons and tons of monitoring, which is great. Ken, did you want to say something?
Ken: Well, so yeah, I, I think that shift-right’s better than nothing.
Nate: Absolutely. Yeah,
Ken: Do nothing’s a bad idea.
So we’re just going to write code, check it in and production go to lunch. Right? Obviously that’s not a great idea. Let’s shift right, at least says ‘let’s do it in a safe way.’ And remember we shared that article from GitHub, where they talked about, we put the release out to 2% of the users. We collect metrics to understand how it’s going.
And one of their findings was and GitHub’s one of the biggest sites in the world. 2% of traffic was not enough. They still would get a good feeling sometimes and push it to a hundred percent and find out they had a problem. So they added a stage in between. So first we do 2%, then we do 20% and then we make sure that everything’s still working great.
Then we push out and I think this. That’s an example of showing the problems with shift-right. So if you completely depend upon the production environment to supply all your quality signals, then it’s going to become really hard and this is cumbersome to deploy.
Jason: Yeah, it seems like there’s a lot of synergy between shift-left and shift-right.
In the sense that you kind of need both if you’re going to get there because you almost have to accept the fact that something will always some problems will always escape into production, especially when you have so many interdependencies that you could never possibly test for all of the scenarios that could occur in the real world.
Would you say there’s what, what kind of scenarios would you say are not fit for shift-right? I mean, it seems like perhaps if I was a like an Airbnb or a YouTube or some company like that. I mean, the world’s not going to end if somebody can’t see their video or book a room that day, but there’s other scenarios that it may not be a great fit for.
So what are your thoughts there?
Ken: So, so actually I really liked this question, Jason, because a lot of these best practices are coming from companies like Netflix, like Google that have a different kind of business model than others. Even Airbnb actually. So a lot of people are browsing on Airbnb versus how many are transacting.
And if you compare that to something like we’re working with a company who has an online ordering system, when someone’s on your mobile app, trying to do a transaction, you don’t want to use them as a test case because if you’re wrong about the version of the code they’re running, the order will fail.
And the blow back. I mean, anyway, you have a direct revenue impact there versus I think, Google and Netflix have a different model, especially Netflix, actually they have a subscription model. So they make the same from the customer, whether they have their video served or not. So certainly there’s different kinds of industries.
And then also let’s not use shift-right as a crutch. And say, ‘Okay, testing’s so impossible we don’t need to try.” You should make some attempts to see if the code is any good before you get it into customer’s hands.
Nate: Yeah, I think just, just to your last point, can I think the right answer or the right balance is always somewhere in between, right?
You got to do some shift left to mitigate risk and shift-right because you know, there are some true benefits, right? You’re capturing all the use cases that customers are doing, there’s a lot more volume you can do. And there’s just some things that you can’t replicate until you do testing in production. Right?
Ken: One of the things I like about sort of shift-right versus a traditional approach to testing. So in a traditional approach to testing, you have to think of all the use cases you have to try to code them all. You have to try to say. What if a user does this or that. And a lot of times it ends up being a little too clean, a little too pristine because you have to tear down the environment and start over.
And when you get to production, it’s messy. So the good thing about a shift-right is you see the messiness, you see all the traffic coming in, all the different calls and the app has to handle it. When we did an install with a customer recently, we noticed how much of their traffic was coming from the monitoring systems themselves.
It was actually about half the traffic was the monitoring systems hitting it. That happens for real in production– if you forget to size that and half your traffic is monitoring you– you could have it sized wrong. So it’s actually one of the things that we’re trying to account for with Speedscale is have that realistic thing.
We’re not guessing. We’re not trying to say how do you think users use it? It’s what really happens in the system.
Jason: Yeah, those unhappy paths are impossible to perceive until you see them. Right. And that’s just going to be the reality going forward from here. I think, I mean, part of the new reality is the fact that we’re able — based on the the advent of cloud, we’re able to store so much more data and use so much more data in our release process than we ever could before.
And, at the same time having things like containerization and APIs as sort of this reference model. So you have these two forces that are combining to make shift right possible now where it wasn’t before. Right. I don’t think it would have been possible to do this in an old three tier system because we just didn’t have the bandwidth or capacity to keep all that data.
So, you know, what do you think about How this came about or how the capability is kind of caught up with the practice itself.
Ken: So I actually, I think your point about containerization, APIs, what it’s done is it’s made the application a lot more modular. I don’t have to take down the entire system to, to make a change. We don’t have to have you and you’ll notice people typically don’t have maintenance windows anymore. Except with rare exceptions.
So I can just replace just the one component. And this has, this is actually one of the drivers. Why facilitates the need for shift right. Is too hard to build the environment in pre-production that looks the same way. But on the Speedscale side, we’re taking advantage of the cloud data warehouse technology.
And we’re actually sucking in all of the API calls and building– it’s not a model we’re building the actual calls that flow through the system during a period of time. And then we let you replay that against the next version. So it’s it’s totally different than this idea of trying to guess what users are doing.
And we also lock in all the backend dependencies. How did they respond during that time? Did they send errors? People are always asking that: “well, how come that sent an error?’ When your system normally gets an error. So you, you need to respond that way.
Nate: Well, one thing that you called out, Jason, I thought that was interesting is what people are really pursuing, which shift-right testing. And I think there’s some of that experimentation AB testing, right? To find out what customer preferences are, but there’s an aspect of like exploratory testing and tuning that they’re trying to do right.
With you know, chaos engineering, making sure what’s going to happen if they can harden systems. And that’s one of the things that I think Speedscale’s done a good job of doing is providing a lot of the shift, right benefits of that exploratory testing, tuning, change things, and then retest, changing and retest — that iterative process.
But we’ve managed to shift that left where there isn’t nearly the risk that you have with breaking something that people are actually using. So, so capturing that intent accomplishing the goal in a different way has been a strong suit of Speedscale’s.
Jason: I guess we’re shifting left shift right?
Ken: That’s exactly what I was thinking. Jason, we took your shift-right. And shifted it left for you.
Nate: So at what point do you call it continuous testing? I don’t know. I know that there’s, some people would say that, but I mean, we’re regardless the end game is, you know your system doesn’t ever go down, especially when you’re a banking company and you click, you know, ‘Complete deposit.’
And then it says something went wrong. Like, is your money still there? Did it disappear into the ether? Did a Bitcoin server eat it? Like what happened?
Jason: Well, cool. I think this is a pretty good discussion. Thanks for joining me. But yeah, it’s kinda fun to talk about. We should revisit it again with some some more of our insights from the field, but thanks for joining me.
Ken: Oh, thanks a lot, Jason.