Your data should not leave the company in an AI prompt.
Generative AI tools turned every employee browser into an exfiltration channel: source code, customer PII, API keys, contracts and patient records get pasted into ChatGPT, Claude, Gemini and Copilot every day. The Zedmos AI Gateway is an inline TLS-bumping engine that sees the prompt before the provider does, runs Data Loss Prevention on it, and either lets it through, redacts it, or blocks it — all on the same policy plane that powers the firewall.
One pasted chat can leak more than a stolen laptop.
AI assistants are deeply useful — which is exactly why every team is using them, and exactly why DLP is now an AI-first problem. The question is no longer whether your people use ChatGPT or Claude; it is what they put into the prompt, and whether you can prove what left your network.
Every AI prompt walks the same six-stage rail.
From the moment a browser opens a tab to chatgpt.com or claude.ai, the request rides through a deterministic pipeline before it ever reaches the provider. There is no agent on the laptop, no SDK to integrate, no per-app plumbing — the engine sits on the wire.
Two detection layers. One decision.
Zedmos combines deterministic pattern matching with a small local LLM verdict. The fast path catches the textbook leaks; the slow path catches everything attackers re-shape to dodge regex. The policy plane decides which combination applies per provider, per user, per path.
- Hyperscan regex engine: SSN, IBAN, credit card (Luhn), phone, passport, Personnummer, NPI, Aadhaar, ICD-10
- Format-aware: separator-form cards, century-form Personnummer, base64 / hex blobs, zero-width-unicode obfuscation
- Secret presets: API keys, JWTs, AWS / GCP / Azure credentials, private key blocks
- Custom catalogs: drop a new pattern into policies.json — atomic hot-reload, no restart
- qwen2.5:7b on Ollama (GPU-accelerated) — runs on the same appliance, no data leaves the box
- Catches semantic PII the regex misses: ‘my customer at 123 Main Street with SSN ending 4321’
- Llama-Guard-style classification: S1–S14 categories + prompt-injection / jailbreak detection
- Refusal fail-safe: if the verdict is unclear, the engine defers to the strictest rule
- policies.json::ai_gateway.rules[] — provider, path, method, scope, action
- 14 enforcement actions: allow · log · drop · reset · redirect · redact · escalate · …
- Per-user, per-group, per-device, per-schedule scopes
- Atomic generation swap — change a rule, never drop a packet, never restart the engine
Every AI surface, one gateway.
Coverage is data-driven, not code-driven. Adding a new AI service is a catalog entry in policies.json — no engine recompile, no agent push, no SDK to wire.
From silent drop to coached redirect.
Most enterprise DLP gives you two buttons: allow or block. Zedmos gives you fourteen. The same match can shape, redact, redirect, escalate, or just log — whatever fits the team’s risk posture for that data class, that user, that AI service.
Built and verified against the real ChatGPT and Claude apps.
Modern AI front-ends are hostile to DLP by accident: HTTP/2 multiplexing, brotli streams, websocket fallbacks, anti-bot pinning, encoded prompts. Zedmos was engineered against the actual production behaviour of these apps — not a synthetic test harness.
- H2 enterprise inspection: per-stream BUFFER on HEADERS, full response header mirror, gRPC trailer handling, 1xx Early-Hints guard.
- TLS bump with ALPN mirror and SNI suffix-trie — no certificate-pinning trip for general traffic, fail-safe to forward.
- Live evidence on the lab fleet: 28 / 28 Claude DLP cases pass — 14 BLOCK, 14 ALLOW — with HTTP/2 transport intact.
- H1 pipelining boundary walker for cached keep-alive TCPs — the path Chrome uses to dodge naïve inline inspectors.
- Authoritative dlpd verdict overlay — the same daemon scores every flow, the engine cannot silently disagree.
Three modes. Same engine. No reconfig.
Roll out the AI Gateway the way the team is ready to absorb it — observe first, redact next, block last. The pipeline is the same; only the action verb changes.
- Inline taps every AI request, runs the full DLP layer, writes to the audit plane.
- Action = ALLOW + LOG by default — zero user impact.
- Surfaces the top five offenders, top five data classes, and top five providers in week one.
- Compliance teams get evidence; security teams get a baseline.
- PII and secrets are stripped from the prompt before it reaches the provider.
- High-risk prompts are redirected to an on-prem or tenant-controlled LLM.
- The user sees a short coaching message — no support ticket needed.
- Audit plane keeps the full match context for the security team.
- High-confidence matches (source code, credentials, regulated PII) get HTTP 403 or TCP RST.
- Escalate webhook fires to SIEM / SOAR / on-call in under a second.
- Per-user step-up: a repeat offender drops to ‘coach’ until reviewed.
- Same policy.json — only the action verb flips from log to block.
What people ask the second they see this.
›Does Zedmos send our prompts to a third party?
No. Detection happens on the same appliance: Hyperscan regex in-process, qwen2.5:7b LLM verdict via local Ollama on the box’s GPU. The prompt is only forwarded to the AI provider if the policy allows it.
›What about HTTPS — can you even see the prompt?
Yes. The engine TLS-bumps the connection with a CA the company controls. The browser sees a valid Zedmos-signed certificate; the engine sees the cleartext prompt; the AI provider sees the same prompt re-encrypted with their own TLS. No data leaves the box that the policy did not permit.
›Will ChatGPT or Claude break?
No. Zedmos runs full HTTP/2 with header mirroring, ALPN mirror, grpc trailers, and 1xx Early-Hints handling. The lab fleet uses the real apps every day — 28 / 28 Claude DLP cases pass with the chat UI fully functional.
›How fast does this add latency to the user?
Sub-millisecond on the deterministic layer (Hyperscan averages ~0.74 ms p50 on the test fleet). The LLM layer is opt-in per rule and adds 200–400 ms warm; we recommend it on ‘paste-shaped’ POST bodies only, not on every keystroke.
›Can we add an AI service we built in-house?
Yes. policies.json::ai_gateway.rules[] is the public API. Add the SNI, paths, methods, scope and action. The engine reloads atomically — no recompile, no restart, no agent push.
›Does the user know they were blocked?
Your choice. BLOCK · 403 returns a branded coaching page. BLOCK · RST is silent. REDACT is transparent — the user sees the answer, just without the SSN. ESCALATE is invisible to the user but loud to the SOC.