PII scanning in production: what actually blocks the call

Post-hoc vs pre-call

Most PII detection in LLM applications is post-hoc: the request goes to the provider, the response comes back, then a scanner reviews the stored content and flags what should have been redacted. This is fine if your only goal is to know what happened. It is not fine if your goal is to prevent the leak from happening.

The difference is the difference between an incident report and a control.

SchneeAI scans before the upstream call. If the policy says block, the request never reaches the provider. The user sees a structured error; the operations team sees the audit event; the raw prompt never leaves the gateway.

This sounds simple. It isn’t. Doing it at request latency, with low false-positive rates, across tenants with different policies, is the harder design problem.

What the scanner sees

A prompt is not just text. By the time it reaches the scanner, it’s:

A system prompt (often tenant-configured)
A user message (free text, possibly structured)
Tool call parameters (JSON, often with embedded strings)
File content (extracted text from PDFs, images via OCR)
Conversation history (prior turns that may include redacted content from earlier scans)

A scanner that only looks at the user message misses the system prompt. A scanner that looks at the whole request as one blob can’t tell you where the finding is. The scanner needs to operate per-field, with the field path captured in the finding.

17 categories, not “PII yes/no”

SchneeAI ships with 17 detection categories:

Category	Severity	Verify step
credit_card_number	critical	Luhn checksum
us_ssn	critical	Area/group/serial rules
us_passport	critical	Format + checksum
us_drivers_license	warning	Format only
us_bank_account	critical	ABA routing check
iban	critical	IBAN mod-97
swift_bic	info	Format only
japan_mynumber	critical	Mod-11 check digit
japan_banking_account	warning	Format + bank code lookup
email_address	info	RFC 5322 simplified
phone_number	info	E.164 normalization
ip_address	info	IPv4/IPv6 format
mac_address	info	Format only
api_key_generic	critical	Entropy + prefix match
aws_access_key_id	critical	Format (AKIA…)
github_pat	critical	Format (ghp_…)
private_key	critical	PEM header + structure

Why this list? It’s the union of “things regulators care about” (credit cards, SSN, mynumber, IBAN), “things attackers can use immediately” (API keys, private keys, PATs), and “things that are personally identifying but routine” (email, phone, IP).

The third group is in the list because some tenants want it flagged. Others don’t. Severity and action are policy, not detection.

Severity is a hint, not a verdict

Each category has a default severity, but severity alone doesn’t decide what happens. The tenant’s policy does:

critical → default block, can be downgraded to mask
warning → default mask, can be downgraded to flag or upgraded to block
info → default flag, can be upgraded to mask for strict tenants

A hospital running SchneeAI might set email_address to mask. A consumer chatbot might leave it at flag. The scanner emits findings with severity; the policy engine decides actions.

The reason for this split: detection is mostly objective (a string is or isn’t a valid credit card number). Action is contextual (whether that finding should block depends on the tenant, the prompt, and sometimes the user’s consent state).

The verify step

Detection without verification produces false positives. The pattern 4111-1111-1111-1111 is a credit-card-shaped string. Whether it’s a real card number depends on the Luhn checksum. The pattern 123-45-6789 is an SSN-shaped string. Whether it’s a real SSN depends on area/group validity rules.

For categories with a deterministic verify step, the scanner runs it before emitting the finding:

detect regex match → verify checksum → emit finding
                    ↓
                    verify fails → suppress finding

This cuts false positives by an order of magnitude without weakening detection. The categories that have verify steps are the ones where verification is cheap and reliable: credit_card_number (Luhn), iban (mod-97), japan_mynumber (mod-11), us_ssn (area/group), aws_access_key_id (format).

For categories without a verify step — email_address, phone_number, api_key_generic — the scanner accepts the format match and relies on policy to handle the noise. Most tenants set these to flag rather than block.

Overlap resolution

A string can match multiple categories. 4111 1111 1111 1111 is both credit-card-shaped and could match a phone-number regex if it’s loose. ghp_xxxx... is a GitHub PAT but might also match a generic api_key pattern.

The scanner resolves overlaps in priority order:

Critical with verify (e.g., credit_card_number with passing Luhn) wins over phone_number.
Specific (github_pat) wins over generic (api_key_generic).
Higher severity wins on ties.

The output is a single finding per match range, with the resolved category. Lower-priority matches are recorded as alternates in the finding metadata, not emitted as separate findings.

What to do when the scanner fires

The policy engine maps severity to one of three actions:

flag

The finding is recorded in the audit log and the interaction metadata, but the request proceeds unchanged. Use case: low-severity detections where the operations team wants visibility without disruption.

mask

The detected substring is replaced with a category-labeled token before the upstream call. 4111 1111 1111 1111 becomes [CREDIT_CARD_NUMBER]. The model sees the mask; the original is never sent. Use case: medium-severity detections where the prompt is still useful without the PII.

The mask is reversible only via the Vault: the original is stored encrypted with the interaction, linked by the finding ID. Operations can reconstruct the original prompt if needed for incident investigation.

block

The request is refused before the upstream call. The caller receives a structured error with the finding category, severity, and the field path. The audit log captures the blocked interaction. Use case: critical detections where sending the prompt is worse than refusing.

Block is the strongest action. It’s also the most visible to end users — a blocked request is a failed request. Tenants who set critical → block need a plan for legitimate prompts that trip the scanner (false positives on critical categories are rare but real).

The Vault connection

Every finding — flag, mask, or block — is written to the audit log with:

The interaction ID
The category, severity, and action taken
The field path where the finding was detected
For mask: the mask token and the Vault pointer to the original
For block: the refusal reason returned to the caller

The raw matched substring is never written to the audit log. For mask actions, the original is stored in the Vault under the same retention class as the interaction. For block actions, the original is stored in the Vault with a short retention window (default 30 days) for incident investigation, then purged.

This separation is the same pattern as the Usage Ledger / Vault split: metadata that’s broadly useful for analytics goes in one store, raw content that’s consequential to access goes in another.

Latency

The scanner runs in the request path. It has to be fast.

Detection is a layered set of regex matchers with cheap verify steps. On the reference deployment, full-request scan latency is single-digit milliseconds for typical prompts (under 4KB total content), and under 20ms for large requests (over 50KB). This is well under the round-trip time to any LLM provider.

The expensive cases are file content — OCR’d PDFs, transcribed audio — where the extracted text can be tens of thousands of characters. For these, the scanner runs in parallel with the file extraction, so the marginal cost is usually hidden behind extraction time.

When the scanner is wrong

False positives happen. Two cases matter.

Verified false positive. A category with a verify step that nonetheless fires on a non-PII string. Example: a 16-digit order number that happens to pass Luhn. Rare but real. Resolution: the tenant adds an allow-rule scoped to the specific field path (e.g., “order numbers in tool call lookup_order are not credit card numbers”). The scanner skips detection on matched fields.

Policy disagreement. The detection is correct but the action is too aggressive. Example: a tenant’s customer support transcripts contain credit card numbers that customers typed in, and the tenant wants these masked but not blocked. Resolution: the tenant downgrades credit_card_number from block to mask. The scanner behaves identically; only the policy changes.

The scanner is not opinionated about policy. It detects and verifies. The tenant decides what to do.

What to take from this

If you’re building pre-call PII scanning:

Separate detection from policy. The scanner emits findings; the policy engine decides actions. Mixing them makes both harder to reason about.
Verify when you can. Cheap checksums (Luhn, mod-97, mod-11) cut false positives dramatically. Skip verify only for categories where no cheap check exists.
Scan per-field, not per-request. Knowing where the finding is matters as much as knowing what it is.
Block is a policy decision, not a scanner decision. A scanner that blocks is a scanner you can’t downgrade.
Audit every finding, never log the raw match. The finding metadata is enough; the original goes in the Vault if it needs to go anywhere.

PII scanning is one of those features that’s easy to demo and hard to operate. The work isn’t in the regex. It’s in the policy model, the verify steps, the false-positive handling, and the audit trail.

The scanner is one of SchneeAI’s pre-call controls. Read the product overview to see how it fits with the gateway, Vault, and policy engine, or start a conversation about your PII handling requirements.