Run Speedscale in your cloud. Keep every byte of traffic in your VPC.
Speedscale BYOC installs the full capture-and-repair stack inside your AWS, GCP, or Azure account. Traffic, inference, and audit logs all run on infrastructure you own. The vendor ships software updates. That's it.
Why BYOC, not multi-tenant SaaS
Reproducing a real production bug means you need the actual payload: headers, body, every quirk of that specific request. That data is too sensitive for most regulated teams to send outside the firewall. BYOC removes the trade-off.
Data sovereignty
PCI card data, PHI, trading positions: none of it leaves your VPC. No shared tenancy means no cross-customer data risk and no third-party SOC2 scope to inherit.
Bring your own LLM
Point the agents at whatever LLM you're already running: Anthropic, OpenAI, self-hosted Llama, or your own model. Prompts, completions, and inference costs stay under your existing AI governance.
Your cloud, your economics
Reserved instances, spot, PrivateLink, committed-use discounts: all of it applies. No egress fees on capture. No vendor markup on storage. No surprise overages when traffic spikes.
How the BYOC stack runs
One Helm install, three components. Speedscale ships the software; you run it in your Kubernetes cluster with the access controls you already have.
Capture
An eBPF agent and Kubernetes operator capture full request and response payloads in production, without SDK changes or code instrumentation. Payloads land in object storage inside your account.
Orchestrate
The agent factory runs the closed loop: discover failing requests, draft a repro spec, triage root cause, validate the fix against the same captured traffic, and open the PR. All compute stays in your cluster.
Reason
The agents call your LLM endpoint: Anthropic in your VPC, OpenAI with your DPA, self-hosted Llama on your GPUs, or an internal model. Prompts and completions log to your existing AI audit trail.
Why your observability vendor can't do this
eBPF observability vendors and APM tools were built to display spans, not to reproduce bugs. The difference shows up in the one place that matters: the request body.
eBPF observability (Pixie, Groundcover, Metoro, Coroot)
Truncate payload bodies because monitoring doesn't need full fidelity. That's fine for dashboards. But a truncated body can't replay a failed request.
APM and error tracking (Datadog, Dynatrace, Sentry)
Tell you a request failed. Don't tell you how to reproduce it. There's no path from alert to merged fix without building it yourself.
Capture-replay point tools (GoReplay, WireMock, Hoverfly)
Cover one step of the loop (usually replay) and leave the rest to you. Discovery, triage, validation, and the PR are all manual.
AI coding agents (Cursor, Claude Code, Copilot)
Write code. Can't reproduce a real customer's failed request without production data they don't have. BYOC feeds them that context.
None of these categories fit. Speedscale closes the loop from failure to fix: capture the real request, reproduce it, validate the patch against the same traffic, open the PR. BYOC is what makes that loop viable in regulated environments.
Who runs Speedscale in BYOC mode
Financial services
Banks, card networks, and trading platforms where captured payloads contain account numbers, trade data, and other regulated content. BYOC keeps all of it inside the existing audit perimeter, including the AI reasoning that touches it.
Healthcare
HIPAA-covered systems where PHI can't cross a vendor boundary under any BAA. BYOC removes that conversation: data, reproduction, and AI inference all run inside your HIPAA-eligible cloud account.
Retail and payments (PCI)
PCI-scoped systems where card data appears in real request bodies. Running capture inside your own VPC means no PCI scope expansion and no shared-tenancy ambiguity.
Travel and hospitality
High-volume booking platforms that need realistic traffic to catch problems before they become 3am incidents. FLYR runs Speedscale in BYOC to validate releases against live production patterns before each deploy.
You don't need to be AI-native on day one
The data lake is live in minutes. Most regulated teams layer in the AI loop once they've seen replay working โ but that's weeks, not years. BYOC gives you the full stack; you decide the pace.
Day 1: Data lake live
Helm deploys in under 10 minutes. Production traffic starts flowing to your storage backend immediately. No code changes, no SDK, no restarts.
Week 1: First replay
Engineers replay real production requests against builds and catch the first set of regressions. Bugs that were invisible in staging are now reproducible in seconds.
Month 1โ2: Closed loop
On lower-risk surfaces, the AI loop runs end to end: discover, reproduce, validate, PR. Your LLM, your policies, your audit trail. Humans keep the gate on anything sensitive.
Install in minutes via Helm
Two commands. Pick your backend. Everything stays in your cluster.
Step 1 โ Add the Helm repo
helm repo add speedscale-byoc https://speedscale.github.io/speedscale-byoc/
helm repo update Step 2 โ Pick your backend
| Backend | Install command |
|---|---|
| Grafana + Loki | helm install byoc-grafana speedscale-byoc/grafana -n byoc-grafana --create-namespace |
| Elasticsearch + Kibana | helm install byoc-es speedscale-byoc/elasticsearch -n byoc-elasticsearch --create-namespace |
| Fluent Bit โ GCS | helm install byoc-gcs speedscale-byoc/fluentbit-gcs -n byoc-fluentbit-gcs --create-namespace |
| Fluent Bit โ S3 | helm install byoc-s3 speedscale-byoc/fluentbit-s3 -n byoc-fluentbit-s3 --create-namespace |
Prerequisites, operator-values.yaml examples, verify steps, and troubleshooting in the full install guide โ
Ready to deploy Speedscale inside your cloud?
A typical BYOC rollout takes four to six weeks from kickoff to first reproduced bug. We do the install with your platform team. Security reviews the components on their normal schedule; AI governance signs off on the LLM endpoint before it's wired in.