DLP · AI GATEWAY

Your data should not leave the company in an AI prompt.

Generative AI tools turned every employee browser into an exfiltration channel: source code, customer PII, API keys, contracts and patient records get pasted into ChatGPT, Claude, Gemini and Copilot every day. The Zedmos AI Gateway is an inline TLS-bumping engine that sees the prompt before the provider does, runs Data Loss Prevention on it, and either lets it through, redacts it, or blocks it — all on the same policy plane that powers the firewall.

inline DLPAI GatewayChatGPT · Claude · Gemini · Copilotregex + LLM verdictsub-ms decision

WHY THIS MATTERS

One pasted chat can leak more than a stolen laptop.

AI assistants are deeply useful — which is exactly why every team is using them, and exactly why DLP is now an AI-first problem. The question is no longer whether your people use ChatGPT or Claude; it is what they put into the prompt, and whether you can prove what left your network.

Source code → public corpus

Developers paste proprietary algorithms, security logic, and unfinished features into chat assistants. Providers may use that prompt for evaluation, training, or debugging logs.

Customer PII → third-party logs

Sales and support copy CRM rows into the chat to ‘help draft a reply’. SSNs, IBANs, addresses, health markers and contract IDs land in a third-party model in seconds.

Secrets → indexable artifacts

API keys, JWTs, database URIs and cloud credentials end up in shared chats. Anyone who later gets that conversation, scrapes a screenshot, or exports the history has them.

Prompt injection → DLP bypass

Attackers wrap PII inside ‘ignore previous instructions’ payloads and base64 blobs to trick naïve filters. A real gateway has to understand intent, not just substrings.

HOW IT PROTECTS

Every AI prompt walks the same six-stage rail.

From the moment a browser opens a tab to chatgpt.com or claude.ai, the request rides through a deterministic pipeline before it ever reaches the provider. There is no agent on the laptop, no SDK to integrate, no per-app plumbing — the engine sits on the wire.

DEFENCE IN DEPTH

Two detection layers. One decision.

Zedmos combines deterministic pattern matching with a small local LLM verdict. The fast path catches the textbook leaks; the slow path catches everything attackers re-shape to dodge regex. The policy plane decides which combination applies per provider, per user, per path.

Layer 1 — deterministic matchers

Hyperscan regex engine: SSN, IBAN, credit card (Luhn), phone, passport, Personnummer, NPI, Aadhaar, ICD-10
Format-aware: separator-form cards, century-form Personnummer, base64 / hex blobs, zero-width-unicode obfuscation
Secret presets: API keys, JWTs, AWS / GCP / Azure credentials, private key blocks
Custom catalogs: drop a new pattern into policies.json — atomic hot-reload, no restart

Layer 2 — local LLM verdict

qwen2.5:7b on Ollama (GPU-accelerated) — runs on the same appliance, no data leaves the box
Catches semantic PII the regex misses: ‘my customer at 123 Main Street with SSN ending 4321’
Llama-Guard-style classification: S1–S14 categories + prompt-injection / jailbreak detection
Refusal fail-safe: if the verdict is unclear, the engine defers to the strictest rule

Decision plane

policies.json::ai_gateway.rules[] — provider, path, method, scope, action
14 enforcement actions: allow · log · drop · reset · redirect · redact · escalate · …
Per-user, per-group, per-device, per-schedule scopes
Atomic generation swap — change a rule, never drop a packet, never restart the engine

COVERAGE

Every AI surface, one gateway.

Coverage is data-driven, not code-driven. Adding a new AI service is a catalog entry in policies.json — no engine recompile, no agent push, no SDK to wire.

ChatGPT

chatgpt.com · /backend-api/conversation · /api/append_message

Claude

claude.ai · api.anthropic.com · /v1/messages

Gemini

gemini.google.com · ai.google.dev

Copilot

github.com/copilot · copilot.microsoft.com

Perplexity

perplexity.ai

Mistral · Le Chat

chat.mistral.ai · api.mistral.ai

On-prem LLM

private Ollama · vLLM · TGI endpoints

Custom

any HTTPS endpoint — declarative ai_gateway rule

WHAT HAPPENS ON A HIT

From silent drop to coached redirect.

Most enterprise DLP gives you two buttons: allow or block. Zedmos gives you fourteen. The same match can shape, redact, redirect, escalate, or just log — whatever fits the team’s risk posture for that data class, that user, that AI service.

BLOCK · 403

Inject an HTTP 403 with a coaching page. The browser shows the user why their prompt was refused and points to an approved tool.

BLOCK · RST

TCP RST both sides — quietest possible refusal. Used for high-confidence credential or PII leaks where the user should not retry the same prompt.

REDACT

Strip the matched span (SSN, key, token), forward the rest. The AI provider only sees the sanitised prompt; the user still gets a useful answer.

REDIRECT

Send the user to an internal, sanctioned LLM (on-prem Ollama, Bedrock, Azure OpenAI tenant). Same UX, controlled provider.

ESCALATE

Page on-call / SIEM / SOAR webhook. Used for source-code leaks or evident exfiltration — the security team sees it inside a second.

ALLOW + LOG

Pass through, but record the full match context to the audit plane. The default mode while a team is onboarding.

PROVEN ON THE WIRE

Built and verified against the real ChatGPT and Claude apps.

Modern AI front-ends are hostile to DLP by accident: HTTP/2 multiplexing, brotli streams, websocket fallbacks, anti-bot pinning, encoded prompts. Zedmos was engineered against the actual production behaviour of these apps — not a synthetic test harness.

H2 enterprise inspection: per-stream BUFFER on HEADERS, full response header mirror, gRPC trailer handling, 1xx Early-Hints guard.
TLS bump with ALPN mirror and SNI suffix-trie — no certificate-pinning trip for general traffic, fail-safe to forward.
Live evidence on the lab fleet: 28 / 28 Claude DLP cases pass — 14 BLOCK, 14 ALLOW — with HTTP/2 transport intact.
H1 pipelining boundary walker for cached keep-alive TCPs — the path Chrome uses to dodge naïve inline inspectors.
Authoritative dlpd verdict overlay — the same daemon scores every flow, the engine cannot silently disagree.

OPERATING POSTURE

Three modes. Same engine. No reconfig.

Roll out the AI Gateway the way the team is ready to absorb it — observe first, redact next, block last. The pipeline is the same; only the action verb changes.

Audit

mode · monitor

See, don’t block

Inline taps every AI request, runs the full DLP layer, writes to the audit plane.
Action = ALLOW + LOG by default — zero user impact.
Surfaces the top five offenders, top five data classes, and top five providers in week one.
Compliance teams get evidence; security teams get a baseline.

Know what you don’t know.

Coach

mode · coach

Redact and redirect

PII and secrets are stripped from the prompt before it reaches the provider.
High-risk prompts are redirected to an on-prem or tenant-controlled LLM.
The user sees a short coaching message — no support ticket needed.
Audit plane keeps the full match context for the security team.

Productivity intact, exfil neutralised.

Enforce

mode · enforce

Block the leak, page the team

High-confidence matches (source code, credentials, regulated PII) get HTTP 403 or TCP RST.
Escalate webhook fires to SIEM / SOAR / on-call in under a second.
Per-user step-up: a repeat offender drops to ‘coach’ until reviewed.
Same policy.json — only the action verb flips from log to block.

Stop the exfil before the response loads.

FAQ

What people ask the second they see this.

›Does Zedmos send our prompts to a third party?

No. Detection happens on the same appliance: Hyperscan regex in-process, qwen2.5:7b LLM verdict via local Ollama on the box’s GPU. The prompt is only forwarded to the AI provider if the policy allows it.

›What about HTTPS — can you even see the prompt?

Yes. The engine TLS-bumps the connection with a CA the company controls. The browser sees a valid Zedmos-signed certificate; the engine sees the cleartext prompt; the AI provider sees the same prompt re-encrypted with their own TLS. No data leaves the box that the policy did not permit.

›Will ChatGPT or Claude break?

No. Zedmos runs full HTTP/2 with header mirroring, ALPN mirror, grpc trailers, and 1xx Early-Hints handling. The lab fleet uses the real apps every day — 28 / 28 Claude DLP cases pass with the chat UI fully functional.

›How fast does this add latency to the user?

Sub-millisecond on the deterministic layer (Hyperscan averages ~0.74 ms p50 on the test fleet). The LLM layer is opt-in per rule and adds 200–400 ms warm; we recommend it on ‘paste-shaped’ POST bodies only, not on every keystroke.

›Can we add an AI service we built in-house?

Yes. policies.json::ai_gateway.rules[] is the public API. Add the SNI, paths, methods, scope and action. The engine reloads atomically — no recompile, no restart, no agent push.

›Does the user know they were blocked?

Your choice. BLOCK · 403 returns a branded coaching page. BLOCK · RST is silent. REDACT is transparent — the user sees the answer, just without the SSN. ESCALATE is invisible to the user but loud to the SOC.