Privacy, Sovereign AI & HIPAA

Sovereign AI in Email Triage

Three concentric privacy guarantees.

Sovereign AI is implemented at three levels of the stack — each strictly stronger than the last.

1. Local-First Classifier

Default Ollama backend on your own GPU host. Qwen / Llama / Mistral families supported. Email content never traverses a third-party API. No SOC 2 boundary to argue about. No data-processing addendum to negotiate. The classifier is yours.

2. BAA-Gated Cloud Backend

Operators with BAA coverage (OpenAI, Gemini, OpenAI-compatible) can configure cloud classification. The BAA gate is enforced in code, not policy. HIPAA-flagged messages skip cloud routing until the operator records BAA acknowledgment in the audit log. The compliance officer's question — "what stops PHI from going to OpenAI?" — has a code-level answer.

3. Hard-Locked Embedding Allowlist

The RAG path (sent-mail context for drafted replies) is hard-coded to local-only — ollama, in-process sentence-transformers, and a fallback composite. A static privacy-invariant test fails the build if anyone tries to add a non-local backend here. Drafted-reply context is higher-volume PHI exposure than per-message classification; the allowlist forecloses cloud embedding entirely.

HIPAA Mode

A single toggle. Cascading enforcement across the codebase.

PHI-Scrubbed Logs

Sender, subject, body, and classification reasoning are redacted from system logs. The [redacted] token replaces these fields in every log line.

Static Privacy Scan

A test suite runs on every build that greps the production source for forbidden field references in log calls. Any new code that would log a sensitive field fails the test before merge.

Redacted Notifications

SMS / push notifications include category and timestamp only. Never the sender, subject, or body content.

Recipient Verification

The daily-digest feature refuses to send if the configured to-address doesn't match the account owner. Protects against accidentally cc'ing PHI to the wrong stakeholder.

PHI-Aware Caching

The classification cache is disabled by default for HIPAA-flagged accounts. The cost (re-classifying repeats) buys the audit posture (no PHI in a side-cache).

Tamper-Evident Audit Log

Every login, account view, and credential use is recorded in an append-only access log with a SHA-256 hash chain. Any alteration of a past row breaks the chain — detectable in seconds via the email-triage audit verify CLI command.

/compliance — HIPAA mode, audit-chain status, BAA acknowledgments, TLS certificate lifecycle

Supply Chain Security

You verify the image before you trust it.

Sovereign AI ends at the model boundary if the runtime itself isn't trustworthy. Email Triage publishes a verifiable supply chain so compliance officers can answer "where did this binary come from?" with a cryptographic chain instead of a vendor email.

Cosign-Signed Images

Every published container image is signed with cosign using keyless OIDC against the GitHub Actions identity. No long-lived signing key to lose or rotate. Operators verify with cosign verify before pull.

SLSA-3 Build Provenance

Each image carries a SLSA-3 build provenance attestation: who built it, when, from which source revision, in which builder. Proves the image came out of the public CI pipeline running against the public source — not a one-off developer laptop build.

Operator Approval Attestation

A separate human-validated review event signed against the same digest (predicate operator-approval/v1). Distinguishes "the CI passed" from "an operator reviewed and approved this release."

HIPAA Install Gate

HIPAA-flagged installs verify both attestations on the same image digest before allowing pull. A poisoned CI run with no operator attestation can't reach a HIPAA host. Verification recipe in the public repo's docs/install.md.

Air-Gap Verifiable

Air-gap installs use scripts/download-embedding-bits.sh on a connected machine to produce a hash-pinned tarball + SHA-256 sidecar. Sideload through the admin UI runs the same hash verification as the auto-download path — operator-staged bytes are not trusted.

Pinned Tags, Floating Aliases

Customers pin to immutable vX.Y.Z tags (same digest under both vX.Y.Z and X.Y.Z forms). Float to X.Y, X, or :latest for development-friendly upgrades. :edge for every push to main.

Source: github.com/Unlimited-Data-Works-LLC/Email-Triage · Image: ghcr.io/unlimited-data-works-llc/email-triage · License: Apache 2.0

Encryption at Rest

Fernet-encrypted secrets. Master key options operator-chosen.

Provider passwords, OAuth refresh tokens, and other secrets are encrypted with Fernet (AES-128-CBC + HMAC-SHA256) using a master key held outside the SQLite database. Storage options for the master key:

Container secret (Podman / Docker secret store) — recommended for production
Keyfile on the host filesystem (mode 0400, owned by the service user)
OS keyring (GNOME Keyring, Windows Credential Manager)
Environment variable (development only)

Backing up the database without also backing up the master key leaves the secrets unrecoverable — a safety property for off-site backup storage.

Multi-Tenancy & Delegation

Three-role audit model.

The user model supports three roles with distinct audit semantics:

Admin — install-wide configuration, all-account access, audit access.
User — owns their own accounts; configures own routes / rules / digests.
Delegate — granted view / triage / draft permission on another user's account.

Delegate actions are stamped with both the actor and the account-owner in the audit log. The HIPAA §164.312(b) audit gate distinguishes owner self-access from delegate access — the former is a §164.502(a) self-disclosure carve-out and isn't audited as PHI access; the latter is and writes an audit row every time.

Standards Honored

RFCs and regulations the system aligns to.

Standard	Application
RFC 5322	Internet Message Format. Parsing inbound mail, writing outbound headers with proper `In-Reply-To` threading.
RFC 6154	IMAP SPECIAL-USE flag. Drafts / Sent folder auto-discovery via `\Drafts` / `\Sent` markers.
RFC 5545	iCalendar. Parsing `.ics` payloads on incoming meeting invites.
RFC 5546	iMIP. Invite-acceptance / decline / tentative drafted as `METHOD=REPLY` with proper threading.
HIPAA §164.312(b)	Technical Safeguards — Audit Controls. Hash-chained audit log; CLI verifier.
HIPAA §164.312(e)(1)	Transmission Security. TLS posture, certificate lifecycle.
HIPAA §164.502(a)	Self-Disclosure carve-out. Audit gate avoids spurious "PHI access" rows on owner self-access.
NERC CIP-007-R4	Logging and Monitoring. Audit-log shape informed by CIP-007.

Three concentric privacy guarantees.

1. Local-First Classifier

2. BAA-Gated Cloud Backend

3. Hard-Locked Embedding Allowlist

A single toggle. Cascading enforcement across the codebase.

PHI-Scrubbed Logs

Static Privacy Scan

Redacted Notifications

Recipient Verification

PHI-Aware Caching

Tamper-Evident Audit Log

You verify the image before you trust it.

Cosign-Signed Images

SLSA-3 Build Provenance

Operator Approval Attestation

HIPAA Install Gate

Air-Gap Verifiable

Pinned Tags, Floating Aliases

Fernet-encrypted secrets. Master key options operator-chosen.

Transmission Integrity (TLS)

Authentication

Three-role audit model.

RFCs and regulations the system aligns to.

Need sovereign AI for your organization, not just for email?