Zedmos
DLP · AI GATEWAY

Your data should not leave the company in an AI prompt.

Generative AI tools turned every employee browser into an exfiltration channel: source code, customer PII, API keys, contracts and patient records get pasted into ChatGPT, Claude, Gemini and Copilot every day. The Zedmos AI Gateway is an inline TLS-bumping engine that sees the prompt before the provider does, runs Data Loss Prevention on it, and either lets it through, redacts it, or blocks it — all on the same policy plane that powers the firewall.

inline DLPAI GatewayChatGPT · Claude · Gemini · Copilotregex + LLM verdictsub-ms decision
WHY THIS MATTERS

One pasted chat can leak more than a stolen laptop.

AI assistants are deeply useful — which is exactly why every team is using them, and exactly why DLP is now an AI-first problem. The question is no longer whether your people use ChatGPT or Claude; it is what they put into the prompt, and whether you can prove what left your network.

Source code → public corpus
Developers paste proprietary algorithms, security logic, and unfinished features into chat assistants. Providers may use that prompt for evaluation, training, or debugging logs.
Customer PII → third-party logs
Sales and support copy CRM rows into the chat to ‘help draft a reply’. SSNs, IBANs, addresses, health markers and contract IDs land in a third-party model in seconds.
Secrets → indexable artifacts
API keys, JWTs, database URIs and cloud credentials end up in shared chats. Anyone who later gets that conversation, scrapes a screenshot, or exports the history has them.
Prompt injection → DLP bypass
Attackers wrap PII inside ‘ignore previous instructions’ payloads and base64 blobs to trick naïve filters. A real gateway has to understand intent, not just substrings.
HOW IT PROTECTS

Every AI prompt walks the same six-stage rail.

From the moment a browser opens a tab to chatgpt.com or claude.ai, the request rides through a deterministic pipeline before it ever reaches the provider. There is no agent on the laptop, no SDK to integrate, no per-app plumbing — the engine sits on the wire.

ENFORCEMENT RAILemployeebrowser · appTLS bumpMITM · SNI routeDLP scanregex · LLM verdictAI Gatewaypolicy matchdecideallow · blockAI providerChatGPT · Claude · …ALLOW · forwardBLOCK · 403 / RSTTO PROVIDERSCRUBBED RESPONSE TO USERWHAT THE ENGINE LOOKS FORPIISSN · IBAN · phonesecretsAPI keys · tokens · pwsource codeinternal repos · diffsprompt injectionjailbreak · DAN · DLP-bypassUser data planeAI Gateway policyAI provider egress
DEFENCE IN DEPTH

Two detection layers. One decision.

Zedmos combines deterministic pattern matching with a small local LLM verdict. The fast path catches the textbook leaks; the slow path catches everything attackers re-shape to dodge regex. The policy plane decides which combination applies per provider, per user, per path.

Layer 1 — deterministic matchers
  • Hyperscan regex engine: SSN, IBAN, credit card (Luhn), phone, passport, Personnummer, NPI, Aadhaar, ICD-10
  • Format-aware: separator-form cards, century-form Personnummer, base64 / hex blobs, zero-width-unicode obfuscation
  • Secret presets: API keys, JWTs, AWS / GCP / Azure credentials, private key blocks
  • Custom catalogs: drop a new pattern into policies.json — atomic hot-reload, no restart
Layer 2 — local LLM verdict
  • qwen2.5:7b on Ollama (GPU-accelerated) — runs on the same appliance, no data leaves the box
  • Catches semantic PII the regex misses: ‘my customer at 123 Main Street with SSN ending 4321’
  • Llama-Guard-style classification: S1–S14 categories + prompt-injection / jailbreak detection
  • Refusal fail-safe: if the verdict is unclear, the engine defers to the strictest rule
Decision plane
  • policies.json::ai_gateway.rules[] — provider, path, method, scope, action
  • 14 enforcement actions: allow · log · drop · reset · redirect · redact · escalate · …
  • Per-user, per-group, per-device, per-schedule scopes
  • Atomic generation swap — change a rule, never drop a packet, never restart the engine
COVERAGE

Every AI surface, one gateway.

Coverage is data-driven, not code-driven. Adding a new AI service is a catalog entry in policies.json — no engine recompile, no agent push, no SDK to wire.

ChatGPT
chatgpt.com · /backend-api/conversation · /api/append_message
Claude
claude.ai · api.anthropic.com · /v1/messages
Gemini
gemini.google.com · ai.google.dev
Copilot
github.com/copilot · copilot.microsoft.com
Perplexity
perplexity.ai
Mistral · Le Chat
chat.mistral.ai · api.mistral.ai
On-prem LLM
private Ollama · vLLM · TGI endpoints
Custom
any HTTPS endpoint — declarative ai_gateway rule
WHAT HAPPENS ON A HIT

From silent drop to coached redirect.

Most enterprise DLP gives you two buttons: allow or block. Zedmos gives you fourteen. The same match can shape, redact, redirect, escalate, or just log — whatever fits the team’s risk posture for that data class, that user, that AI service.

BLOCK · 403
Inject an HTTP 403 with a coaching page. The browser shows the user why their prompt was refused and points to an approved tool.
BLOCK · RST
TCP RST both sides — quietest possible refusal. Used for high-confidence credential or PII leaks where the user should not retry the same prompt.
REDACT
Strip the matched span (SSN, key, token), forward the rest. The AI provider only sees the sanitised prompt; the user still gets a useful answer.
REDIRECT
Send the user to an internal, sanctioned LLM (on-prem Ollama, Bedrock, Azure OpenAI tenant). Same UX, controlled provider.
ESCALATE
Page on-call / SIEM / SOAR webhook. Used for source-code leaks or evident exfiltration — the security team sees it inside a second.
ALLOW + LOG
Pass through, but record the full match context to the audit plane. The default mode while a team is onboarding.
PROVEN ON THE WIRE

Built and verified against the real ChatGPT and Claude apps.

Modern AI front-ends are hostile to DLP by accident: HTTP/2 multiplexing, brotli streams, websocket fallbacks, anti-bot pinning, encoded prompts. Zedmos was engineered against the actual production behaviour of these apps — not a synthetic test harness.

  • H2 enterprise inspection: per-stream BUFFER on HEADERS, full response header mirror, gRPC trailer handling, 1xx Early-Hints guard.
  • TLS bump with ALPN mirror and SNI suffix-trie — no certificate-pinning trip for general traffic, fail-safe to forward.
  • Live evidence on the lab fleet: 28 / 28 Claude DLP cases pass — 14 BLOCK, 14 ALLOW — with HTTP/2 transport intact.
  • H1 pipelining boundary walker for cached keep-alive TCPs — the path Chrome uses to dodge naïve inline inspectors.
  • Authoritative dlpd verdict overlay — the same daemon scores every flow, the engine cannot silently disagree.
OPERATING POSTURE

Three modes. Same engine. No reconfig.

Roll out the AI Gateway the way the team is ready to absorb it — observe first, redact next, block last. The pipeline is the same; only the action verb changes.

Audit
mode · monitor
See, don’t block
  • Inline taps every AI request, runs the full DLP layer, writes to the audit plane.
  • Action = ALLOW + LOG by default — zero user impact.
  • Surfaces the top five offenders, top five data classes, and top five providers in week one.
  • Compliance teams get evidence; security teams get a baseline.
Know what you don’t know.
Coach
mode · coach
Redact and redirect
  • PII and secrets are stripped from the prompt before it reaches the provider.
  • High-risk prompts are redirected to an on-prem or tenant-controlled LLM.
  • The user sees a short coaching message — no support ticket needed.
  • Audit plane keeps the full match context for the security team.
Productivity intact, exfil neutralised.
Enforce
mode · enforce
Block the leak, page the team
  • High-confidence matches (source code, credentials, regulated PII) get HTTP 403 or TCP RST.
  • Escalate webhook fires to SIEM / SOAR / on-call in under a second.
  • Per-user step-up: a repeat offender drops to ‘coach’ until reviewed.
  • Same policy.json — only the action verb flips from log to block.
Stop the exfil before the response loads.
FAQ

What people ask the second they see this.

Does Zedmos send our prompts to a third party?

No. Detection happens on the same appliance: Hyperscan regex in-process, qwen2.5:7b LLM verdict via local Ollama on the box’s GPU. The prompt is only forwarded to the AI provider if the policy allows it.

What about HTTPS — can you even see the prompt?

Yes. The engine TLS-bumps the connection with a CA the company controls. The browser sees a valid Zedmos-signed certificate; the engine sees the cleartext prompt; the AI provider sees the same prompt re-encrypted with their own TLS. No data leaves the box that the policy did not permit.

Will ChatGPT or Claude break?

No. Zedmos runs full HTTP/2 with header mirroring, ALPN mirror, grpc trailers, and 1xx Early-Hints handling. The lab fleet uses the real apps every day — 28 / 28 Claude DLP cases pass with the chat UI fully functional.

How fast does this add latency to the user?

Sub-millisecond on the deterministic layer (Hyperscan averages ~0.74 ms p50 on the test fleet). The LLM layer is opt-in per rule and adds 200–400 ms warm; we recommend it on ‘paste-shaped’ POST bodies only, not on every keystroke.

Can we add an AI service we built in-house?

Yes. policies.json::ai_gateway.rules[] is the public API. Add the SNI, paths, methods, scope and action. The engine reloads atomically — no recompile, no restart, no agent push.

Does the user know they were blocked?

Your choice. BLOCK · 403 returns a branded coaching page. BLOCK · RST is silent. REDACT is transparent — the user sees the answer, just without the SSN. ESCALATE is invisible to the user but loud to the SOC.