Inside the Vault: where raw prompts live

Two databases, on purpose

Most LLM applications write one record per request: timestamp, model, tokens, cost, prompt, response, finish reason — all in one row. It’s the obvious thing to do. It’s also wrong for any system that needs to take governance seriously.

SchneeAI splits the data into two stores from the start:

Usage Ledger — operational metadata. Who called what model, when, how many tokens, what it cost. Fast to query, indexed for dashboards, retained for years.
Vault — raw prompt and output content. Encrypted at the application layer. Separate access controls. Separate retention.

The same interaction has a row in each store, linked by an opaque identifier. The Usage Ledger tells you “an interaction happened.” The Vault tells you “this is what was said.” They are not the same concern.

Why split them

Three reasons.

1. Different audiences, different access

Operational metadata is read often — by dashboards, billing, alerting, on-call engineers debugging latency. It needs to be fast, indexed, and broadly accessible inside the operations team.

Raw content is read rarely — for incident investigation, customer support escalations, or dataset building. When it is read, the access is consequential: you’re looking at user content. That should be deliberate, scoped, logged, and reviewed.

If both live in the same table, the table gets joined into everything. The casual latency query pulls in raw prompts because the column is right there. The debug script dumps rows to a file because it’s easier than selecting columns. Within a year, raw content is everywhere.

The split makes the safe path the default path. The Usage Ledger answers most questions. Touching the Vault is a separate decision.

2. Different retention

Operational metadata has a long useful life — you might aggregate it for years for trend analysis. A 5-year retention window is reasonable.

Raw content has a different calculus. Every day you keep it is a day it could be exposed in a breach, subpoenaed, or used in ways your users didn’t expect. Retention should be as short as the use case allows — weeks or months for most interactions, longer only where compliance or value demands it.

If both live in the same table, retention becomes one number — and that number is whatever satisfies the longer-lived use case. The raw content gets over-retained because the metadata needs to be there.

The split lets retention be configured per data class. The Usage Ledger can keep aggregated metrics for years. The Vault can keep raw content for months, with auto-purge after a configured window.

3. Different threat models

Operational metadata is sensitive — it can reveal usage patterns — but it’s not the same as raw content. A leaked Usage Ledger row tells you “this tenant called GPT-4 for 2,000 tokens at 3pm.” A leaked Vault entry tells you what the user actually said.

The two deserve different defenses. Metadata gets strong access control and audit logging. Content gets application-layer encryption on top of provider-managed encryption at rest, with key management tied to tenant scope.

If they share a table, the defenses get averaged. Splitting them lets each get what it needs.

How the Vault is built

The Vault is two pieces:

Object storage (S3-compatible, R2 in our reference deployment) — holds the encrypted content blob.
PostgreSQL metadata table — holds the storage pointer, encryption key reference, tenant scope, retention class, access audit.

Writes happen inside the interaction’s second transaction, after the upstream LLM call returns:

Tx2:
  update usage event with cost/tokens
  settle credit reservation
  settle budget counters
  write raw content to Vault (encrypted)
  write audit event
  enqueue outbox messages
  commit

Encryption is application-layer: a per-tenant data encryption key (DEK) wraps the content before it ever leaves the gateway. The DEK itself is wrapped by a key encryption key (KEK) stored in KMS. The object in storage is opaque without both layers.

Reads go through a separate API surface. The caller’s identity and tenant scope are checked; the access reason is recorded; the audit event is written before the content is decrypted. Most production paths never read from the Vault — they read from the Usage Ledger.

Retention as a first-class concern

Vault retention is configurable per data class. The defaults we ship with:

Data class	Indicative retention	Why
Active customer interactions	2–3 years	Investigation via Dataset Builder only
Investigated/redacted dataset	3–5 years	Analytics and evaluation
Training-eligible dataset	3–5 years	Only after consent, redaction, review
Model lineage	5+ years	Audit purposes

A background retention worker walks the Vault on a schedule. When an item’s window closes, the worker deletes the object storage blob and the metadata row in the same transaction. There is no “soft delete then forget” pattern — once the window closes, the content is gone.

Customers can configure shorter windows. The worker respects the shorter setting.

Access audit

Every Vault access is an audit event. Not just “who read what” — also “for what stated reason.” The reason is part of the API contract: callers must supply a purpose code (incident-investigation, support-escalation, dataset-build, compliance-review) and the audit event captures it.

These events are themselves retained for the audit window (5+ years). They’re reviewable: governance teams can see who accessed what, when, and why. Anomalous access patterns are themselves detectable.

This isn’t a feature. It’s the only way the Vault design makes sense. If you build a Vault and don’t audit access, you’ve built a target.

When the Vault is wrong

The Vault design has a cost. Every interaction is two writes instead of one. There’s a metadata table and an object store to operate. Key management is its own concern.

For a prototype, this is overkill. For a small team’s internal tool, this is overkill. For a system that handles one tenant’s traffic and never faces a compliance review, this is overkill.

But the path from “internal tool” to “multi-tenant platform with compliance obligations” is shorter than people think. The point at which you wish you’d separated content from metadata is usually six months after the point at which doing so was cheap.

The Vault is what we’d build anyway, sooner.

What to take from this

If you’re building an LLM application today:

Split content from metadata now. Even if your metadata store is the same database, separate the tables and the access paths.
Encrypt content with a per-tenant key. Not because you don’t trust your cloud provider — because you don’t trust every future path the data could take.
Make retention explicit. Pick a number. Make it shorter than you think you need. Lengthen it when a use case demands, not by default.
Audit every content read. If reading content is casual in your system, you have a governance problem, not a security problem.

None of this is hard. All of it is easier to do up front than later.

The Vault is one of the core primitives SchneeAI is built around. Read the product overview to see how it fits with the AI Gateway and PromptOps, or start a conversation about your raw retention needs.