At Speedscale, we’re always trying to find ways to iterate faster and reduce developer toil. In line with that mission, we slant our engineering decisions towards using cutting edge tech because we usually move faster and it also allows us to help our customers later on when they upgrade their own tech stack. Recently, we had the opportunity to upgrade the communication channel between our api-gateway and react front end. This journey provided some unexpected benefits.
During our MVP phase, we initially built our user interface using the react-admin prototyping framework which communicates using GraphQL via the apollo-server. The architecture is easy to very straightforward and common:
Early on we needed something fast that was easy to work with and GraphQL turned out to be a fantastic decision. Our observations from using GraphQL for over a year net out as:
What we love:
- easy developer on-ramp
- fine grained data access facilitates progressive disclosure
- easy to debug in production
What we don’t love:
- large bandwidth requirements
- hard to debug on the developer desktop
- schema drift between backend and front end systems
- lack of native support for hash maps
However, as we migrated off react-admin to pure react, we noticed we were building things more slowly and generally becoming very grumpy every time we had to make certain kinds of changes. Specifically, we started seeing increased blocking between the front-end (react) and API (nodejs) engineers whenever there was a change in the data model. It was taking us longer to add and debug a single data field to the Data Provider and API than to add it to the other 12 microservices feeding the API. Ultimately we diagnosed the problem to the protocol-equivalent of schema drift between our back end microservices, which utilize protobuf, and our front end which utilizes a completely separate schema in GraphQL. Whenever we changed the protobuf, we had to make the same changes in multiple components, often including custom translation logic (for instance to convert a map to an array). This process is manual, error prone, and generally painful.
Before diving into re-architecting a big chunk of our front end we considered a few alternatives:
- Improve automation to enforce consistency and find errors
- Move to a tried and true REST API
- Move to Google’s grpc-web
Ultimately, we decided to implement end-to-end gRPC from react all the way to our data collectors using grpc-web. After overcoming some initial pain, the benefits turned out to be enormous.
gRPC Is Fast AND Easy to Use
What we love:
- grpc requires 50-90% less processing than GraphQL in the web client. We aren’t ready to release comparative load test results but our results are in line with what others have posted. Our users initial reaction is usually “Whoa, what did you do to this thing?”
- Grpc reduces network utilization drastically. We experienced a roughly 70% reduction in data volume on average, with even larger gains for messages containing a large metadata to payload ratio. For example, the left JSON blob enjoys a much higher compression ratio vs the right:
- schema drift is found when code is written instead of when it runs. This turns out to be a substantial improvement in developer experience.
What we don’t love:
- HTTP/2 support, which gRPC relis upon, is somewhat immature as of May 2021. Plan for a small, but ongoing DevOps tax that must be paid. We found and overcame a variety of sharp edges up and down our stack including:
- nginx requires all traffic to be HTTP/2 when using grpc_pass to a TLS enabled endpoint, even though we had a mix of HTTP and gRPC
- AWS ALB HTTP/2 support has arrived but is not yet feature rich
- we were forced into maintaining certificates for every component that egress data through our K8s load balancer
- our synthetic monitoring tool currently doesn’t support HTTP/2
- grpc tools are immature and debugging in production requires a new skillset. You can’t pop open the network tab in Chrome and inspect request information. Fortunately, tools like gRPC-Web Developer Tools, grpcurl, wombat and BloomRPC are starting to emerge.
- Onboarding new engineers takes a little bit longer because of a lack of familiarity with grpc-web. However, our react developers refuse to go back to GraphQL now that they’re over the learning curve.
As we grow, it may become an issue that we have tightly coupled back end and front end data definitions but for now it’s been a huge boost. Also, we haven’t done things like optimize for mobile which may change the equation but for now the migration to gRPC-Web has been unexpectedly smooth and developer productivity has returned to normal.