The PII Testing Dilemma


Why Software Is So Hard to Test (and Why AI Makes It Worse)

Target Audience

Software engineering leaders, platform leaders, and AI/ML leaders responsible for release velocity, quality, and risk management.


Executive Summary

Modern software teams know the uncomfortable truth: production data is the best test data. Unfortunately, production data is also full of PII, secrets, and sensitive context that cannot legally or ethically be exposed to developers or test environments.

This tension creates a persistent quality and deployment speed gap. Teams test with incomplete, synthetic, or outdated data and are then surprised when production behaves differently. AI agents amplify this problem by depending on realistic data distributions, sequences, and edge cases that are nearly impossible to recreate safely.

This post explains why PII is one of the core blockers to good testing, why it’s more hidden than most teams realize and why traditional approaches fall short.


Table of Contents

  1. Why realistic test data matters
  2. The hidden nature of PII in modern systems
  3. Every technology leaks PII differently
  4. Why developers can’t observe real production behavior
  5. Traditional test data workarounds (and why they fail)
  6. What AI coding agents expose: The real problem we need to solve
  7. Looking ahead to Part 2

1. Why realistic test data matters

High-quality testing depends on production-grade behavior:

  • Real request and response shapes
  • Real payload sizes and value distributions
  • Real timing, ordering, and error conditions
  • Real edge cases that no one thought to simulate

Synthetic data and hand-crafted fixtures tend to validate schemas, not systems. They confirm that software works in theory, not that it survives contact with reality.

For AI coding agents, this problem is magnified because AI systems are inherently stochastic which means they produce non-deterministic outputs even with identical inputs. When you introduce fake or synthetic test data into an already non-deterministic system, you’re compounding the uncertainty. The AI agent must navigate two layers of unpredictability: its own stochastic behavior and the artificial patterns in synthetic data that don’t match real-world distributions. This double layer of non-determinism makes the code even more unreliable, as you’ve introduced more variability into an already fuzzy system. Real production data provides the grounding that stochastic AI systems need to produce consistent and reliable results.

Relying on incomplete or artificial data sets can result in missing critical test cases, which reduces overall test coverage and may leave sensitive data exposed. High-quality, relevant data ensures that applications are thoroughly tested against realistic scenarios, reducing bugs and defects.

2. The hidden nature of PII in modern systems

PII is not confined to obvious database columns like email or phone_number.

In modern distributed systems, PII is often hidden, including:

  • Base64-encoded fields inside API payloads that look opaque until decoded
  • JWTs that contain confidential claims, identifiers, roles, or business context
  • Nested JSON objects, headers, and metadata fields
  • Binary formats like gRPC and Protobuf that require decoding just to inspect

This means teams often don’t know where PII exists until it’s already leaked. Each technology layer requires technology-aware inspection and transformation. Treating all systems the same leads to blind spots or incomplete testing. Check your team’s staging environment for an example.

“Just mask the data” assumes you can see the data first, which is increasingly untrue.

4. Why developers can’t observe real production behavior

Because PII is everywhere — and often invisible — organizations restrict access to production data entirely.

The result:

  • Logs are redacted or truncated
  • Payloads are dropped
  • Observability tools show metrics and traces without context

Developers can see that something failed, but not why. This lack of observability doesn’t just slow debugging — it permanently lowers software quality by preventing teams from learning from real production behavior. You can learn more about safe and deep visibility in our observability video series.

5. Traditional test data workarounds (and why they fail)

Common approaches include:

These approaches assume:

  • PII locations are known and static
  • Data is batch-oriented
  • Systems change slowly and data has predictable locations

Modern systems violate all three assumptions. As architectures become more distributed and event-driven, these methods struggle to keep up — especially when traffic shape and sequence matter more than individual records. Test data management tools can help minimize storage costs by reducing the number of redundant data copies.

6. What AI coding agents expose: The real problem we need to solve

AI coding agents are exposing a fundamental problem that has been hiding in plain sight. These agents fall prey to the same traps as human engineers:

  • Rare edge cases
  • Real user behavior patterns
  • Long-tail distributions
  • Sequential decision-making context

Sanitized or synthetic data often removes the very signals AI coding agents rely on, leading to:

  • Overconfidence in test results
  • Surprising failures in production
  • Slower iteration due to fear-driven release processes

But here’s what makes AI coding agents different: their stochastic nature amplifies the consequences. When you combine non-deterministic AI behavior with synthetic or sanitized test data, you’re compounding uncertainty in ways that make failures more frequent and harder to predict. The AI’s inability to produce reliable code when working with unrealistic data surfaces a truth that human engineers have learned to work around: we’ve been testing with inadequate data all along.

This is not just a compliance problem, and it’s not just an AI problem. The real challenge that AI coding agents are forcing us to confront is:

How do we safely observe and reuse real production behavior to improve software quality?

Until teams can answer that question, testing will remain slower, riskier, and less representative than production demands—whether the code is written by humans or AI.

7. Coming Next: Why Traditional Test Data Management Falls Apart

In Part 2, we’ll examine why classic Test Data Management was built for a different era. We no longer live in a world dominated by batch processing and monolithic databases. Modern systems require streaming, real-time approaches instead.

Get started for free

ProxyMock desktop or Speedscale Cloud — choose your path.