Email content never leaves your network. By default. Always. By design.
Sovereign AI in Email Triage
Sovereign AI is implemented at three levels of the stack — each strictly stronger than the last.
Default Ollama backend on your own GPU host. Qwen / Llama / Mistral families supported. Email content never traverses a third-party API. No SOC 2 boundary to argue about. No data-processing addendum to negotiate. The classifier is yours.
Operators with BAA coverage (OpenAI, Gemini, OpenAI-compatible) can configure cloud classification. The BAA gate is enforced in code, not policy. HIPAA-flagged messages skip cloud routing until the operator records BAA acknowledgment in the audit log. The compliance officer's question — "what stops PHI from going to OpenAI?" — has a code-level answer.
The RAG path (sent-mail context for drafted replies) is hard-coded to local-only — ollama, in-process sentence-transformers, and a fallback composite. A static privacy-invariant test fails the build if anyone tries to add a non-local backend here. Drafted-reply context is higher-volume PHI exposure than per-message classification; the allowlist forecloses cloud embedding entirely.
HIPAA Mode
Sender, subject, body, and classification reasoning are redacted from system logs. The [redacted] token replaces these fields in every log line.
A test suite runs on every build that greps the production source for forbidden field references in log calls. Any new code that would log a sensitive field fails the test before merge.
SMS / push notifications include category and timestamp only. Never the sender, subject, or body content.
The daily-digest feature refuses to send if the configured to-address doesn't match the account owner. Protects against accidentally cc'ing PHI to the wrong stakeholder.
The classification cache is disabled by default for HIPAA-flagged accounts. The cost (re-classifying repeats) buys the audit posture (no PHI in a side-cache).
Every login, account view, and credential use is recorded in an append-only access log with a SHA-256 hash chain. Any alteration of a past row breaks the chain — detectable in seconds via the email-triage audit verify CLI command.
/compliance — HIPAA mode, audit-chain status, BAA acknowledgments, TLS certificate lifecycle
Supply Chain Security
Sovereign AI ends at the model boundary if the runtime itself isn't trustworthy. Email Triage publishes a verifiable supply chain so compliance officers can answer "where did this binary come from?" with a cryptographic chain instead of a vendor email.
Every published container image is signed with cosign using keyless OIDC against the GitHub Actions identity. No long-lived signing key to lose or rotate. Operators verify with cosign verify before pull.
Each image carries a SLSA-3 build provenance attestation: who built it, when, from which source revision, in which builder. Proves the image came out of the public CI pipeline running against the public source — not a one-off developer laptop build.
A separate human-validated review event signed against the same digest (predicate operator-approval/v1). Distinguishes "the CI passed" from "an operator reviewed and approved this release."
HIPAA-flagged installs verify both attestations on the same image digest before allowing pull. A poisoned CI run with no operator attestation can't reach a HIPAA host. Verification recipe in the public repo's docs/install.md.
Air-gap installs use scripts/download-embedding-bits.sh on a connected machine to produce a hash-pinned tarball + SHA-256 sidecar. Sideload through the admin UI runs the same hash verification as the auto-download path — operator-staged bytes are not trusted.
Customers pin to immutable vX.Y.Z tags (same digest under both vX.Y.Z and X.Y.Z forms). Float to X.Y, X, or :latest for development-friendly upgrades. :edge for every push to main.
Source: github.com/Unlimited-Data-Works-LLC/Email-Triage · Image: ghcr.io/unlimited-data-works-llc/email-triage · License: Apache 2.0
Encryption at Rest
Provider passwords, OAuth refresh tokens, and other secrets are encrypted with Fernet (AES-128-CBC + HMAC-SHA256) using a master key held outside the SQLite database. Storage options for the master key:
Backing up the database without also backing up the master key leaves the secrets unrecoverable — a safety property for off-site backup storage.
Multi-Tenancy & Delegation
The user model supports three roles with distinct audit semantics:
Delegate actions are stamped with both the actor and the account-owner in the audit log. The HIPAA §164.312(b) audit gate distinguishes owner self-access from delegate access — the former is a §164.502(a) self-disclosure carve-out and isn't audited as PHI access; the latter is and writes an audit row every time.
Standards Honored
| Standard | Application |
|---|---|
| RFC 5322 | Internet Message Format. Parsing inbound mail, writing outbound headers with proper In-Reply-To threading. |
| RFC 6154 | IMAP SPECIAL-USE flag. Drafts / Sent folder auto-discovery via \Drafts / \Sent markers. |
| RFC 5545 | iCalendar. Parsing .ics payloads on incoming meeting invites. |
| RFC 5546 | iMIP. Invite-acceptance / decline / tentative drafted as METHOD=REPLY with proper threading. |
| HIPAA §164.312(b) | Technical Safeguards — Audit Controls. Hash-chained audit log; CLI verifier. |
| HIPAA §164.312(e)(1) | Transmission Security. TLS posture, certificate lifecycle. |
| HIPAA §164.502(a) | Self-Disclosure carve-out. Audit gate avoids spurious "PHI access" rows on owner self-access. |
| NERC CIP-007-R4 | Logging and Monitoring. Audit-log shape informed by CIP-007. |
Beyond Email
Email Triage is one application of sovereign AI patterns. The same approach — local-by-default LLM, code-enforced compliance gates, audit-ready architecture — applies to any AI initiative in a regulated environment. I help organizations design and implement sovereign AI strategy across their stack.