Overview

Get started today
Replay past traffic, gain confidence in optimizations, and elevate performance.

Many engineering organizations have recently begun adopting the practice of platform engineering as a way to increase the velocity of new features being released while managing costs. This new approach to the software development process emerged as a result of the increasing complexity of modern architectures. Its primary focus is to modernize software engineering to better match cloud-native applications.

At first glance, it sounds similar to a familiar concept, DevOps. While they both share similar goals, we’ve identified three main principles that make them different.

 

What is platform engineering?

The term Platform Engineering (PE) continues to evolve but fundamentally it encompasses the practice of building and maintaining the foundational infrastructure and systems that support business-critical applications.

That usually means pursuing excellence in the domains of continuous integration and continuous delivery, performance engineering, security, infrastructure, tooling and most importantly for this discussion, developer experience.

Platform engineering teams are responsible for building and maintaining an internal developer platform (IDP). These internal platforms are designed to improve developer self-service, reduce cognitive load, and automate certain repetitive tasks, boosting developer productivity and accelerating the software development lifecycle.

In many ways, this is an evolution of the discipline of Site Reliability Engineering, DevOps, and even systems administration.

DevOps meme

While PE may include disciplines like DevOps, it is a much more expansive domain with a different approach. The differences seem subtle but lead to very different outcomes. In practice, DevOps engineers mainly focus on applications once they hit production. That means focusing mainly on API observability and security with little time left over for developer experience. At its root, API observability makes applications more… observable. That means improving the quality of application telemetry like logs or metrics. That’s a great start, but my observation is that PE extends the definition of “observable” into experiments designed to expose the behavior of the system. DevOps processes focus on passive automation, while PE is about active experimentation.

DevOps processes focus on passive automation, while Platform Engineering is active experimentation.

 

Principle 1: Feedback loops vs automation

The existence of strict human-driven software testing processes should be viewed as an admission of failure. Abide by this maxim and many things in the cloud native world will get easier. Stated more broadly, Platform engineering focuses on feedback loops connecting the human to the machine (or machine intelligence) rather than just automation. In the early days of DevOps it was a major efficiency improvement to simply automate deployment using Jenkins. PE, however, takes this practice further than automated deployments by creating a system that developers can directly experiment with. The difference is subtle but the results can be enormous.

For example, let’s consider three feedback loops and their consequences:

1) Test in prod with fast rollback, features flags, and limited blast radius

Testing in production environments is great for having a fast feedback loop but it can come at a great cost: your customers end up as your crash test dummies. In some situations it can be passable, like for a content feed that you can simply refresh. It becomes less okay when it’s something more oriented around guaranteed delivery, like a bill pay service.

The other big disadvantage of this approach is that it requires highly-skilled software developers to make intricate decisions about infrastructure design and feature scope to limit blast radius.

2) Microservices architectures

When microservices became popular it solved a key problem by reducing the scope of what each engineering team needed to know. The smaller the immediate codebase, the faster the feedback loop. However, the complexity didn’t go away, it simply moved. Issues manifest only in the staging environment because the problems move to the interactions between components. Or said differently, debugging involves tracing a large system instead of tracing code in a monolith.

3) Traditional performance testing and regression testing

Having a formalized software testing process prevents many kinds of errors escaping to production. That’s pretty much the only upside. The downsides are the expense, the fragility of the process and the tax on development velocity. Most organizations are abandoning manual testing.

The focus of Platform Engineering should be reducing the cost of experimentation while managing tradeoffs. It’s not enough to just automate.

 

Principle 2: Platform includes the developer desktop

Every engineer knows the value of running the entire system on their laptop. The ability to tinker, rewire and refactor without breaking a real system is invaluable for increasing velocity. That may not be possible for many applications but the concept of giving developers their own sandbox is still crucial. For this reason, platform engineering expands the definition of “platform” to include the dev environment and tooling.

DevOps focuses mainly on automating the delivery of the production application and its infrastructure, while PE focuses on automating the delivery of the developer test environments as well. Most organizations implement one of the following patterns:

  • Packaging the application including test and mock data so that it can be simulated on a laptop
  • On-demand preview environments, typically built with each merge request and managed by tools like Argo and Flux
  • Ephemeral service isolation test environments either local or in a cloud provider like Speedscale
  • Realistic centralized test environments with traffic rerouting like Telepresence

Generally these systems are managed by the platform engineering team as a service to the broader engineering team. They need to be managed and designed along with the application.

dev environment meme

 

Principle 3: Everything is ephemeral

Most DevOps practitioners are familiar with the idea of Infrastructure as Code (IaaC) where an entire application environment can be reproduced with the press of a button. This is an excellent start, but Kubernetes takes it further by introducing the idea that everything is short-lived. Instead of carefully crafting virtual machines and software defined networking rules, best practices now say we design systems around short-lived containers with elastic scaling.

Kubernetes preview environments: adoption, use cases & implementations

Learn what Kubernetes preview environments are, how they’re used and why they’re growing in popularity

This shift accelerates development processes in a variety of ways, from testing to rollbacks. Here are a few specific applications of this idea and its advantages:

1) Data portability

If your data is stored as JSON files in the cloud, it can be moved and repurposed easily for different use cases. Backups are simple because it’s just copying files. Analytics are easy because you can just ask Athena, BigQuery, Snowflake, etc to traverse it. Machine Learning training becomes easier because the dataset can be segmented and passed around. Compare this with the relational databases of yore with their proprietary formats and backup systems.

2) Testing

How can stable tests be written when the APIs and applications are constantly changing? Stop trying and utilize ephemeral traffic replay instead. The tests and mocks are always refreshed from real user behavior.

3) Engineering velocity

Some organizations stand up complete application preview environments for every merge request. These environments may live for an hour or less but they let reviewers interact with the running application and see the code in action. When they’re done, the environment disappears.

4) Security

It’s harder to get taken in by ransomware if you can press a button and rebuild your systems from code and restore your data.

The easiest way to get started with this concept is to convert your infrastructure to a system like Terraform, CloudFormation or a similar tool. As you progress, it becomes necessary to shift to a modern container management system like Kubernetes. Some organizations invest in portals for full deployment automation.

software infrastructure meme

Key takeaways

🚫 DON’T treat PE as a rebranding of DevOps

DO learn more about these three key principles and how they apply to your technology stack

🚫 DON’T leave the developer experience as an afterthought

DO design your application platform so it can be scaled up and down

🚫 DON’T treat development and deployment as a linear process

DO identify feedback loops and reduce experimentation effort

🚫 DON’T create static processes and artifacts like virtual machines or testing plans

DO create always-up-to-date feedback loops

Build your Internal Development Platform (IDP) with Speedscale

By recording production traffic and replaying it in test environments, Speedscale helps platform engineers build their IDP with realistic data and mocks. Learn more about production traffic replication in our Definitive Guide to Production Traffic Replication and Replay or sign up below to try us out for free.

Learn more about Platform Engineering vs. DevOps

BLOG

How to create a Kubernetes preview environment

BLOG

Data & traffic are key to Kubernetes preview environments

Ensure performance of your Kubernetes apps at scale

Auto generate load tests, environments, and data with sanitized user traffic—and reduce manual effort by 80%
Start your free 30-day trial today