Explore with AI
ChatGPTClaudeGeminiPerplexity
Essay

How to Run a HIPAA-Compliant LLM: The Four Architectures

Cover image for How to Run a HIPAA-Compliant LLM: The Four Architectures

There is no HIPAA-compliant LLM. There are HIPAA-compliant LLM deployments: a model served through an endpoint covered by a business associate agreement, or on hardware you operate, wrapped in the access controls, audit logging, and retention guarantees the Security Rule expects. No model earns that label on its own, and since no HIPAA certification exists for any software, any vendor selling you a "certified" one is selling vocabulary.

The good news: building a compliant deployment is a solved problem in 2026, and there are exactly four architectures that work. This is the technical guide to all four, what each one demands operationally, and how to pick.

(If you landed here asking whether a specific chatbot is allowed, that is a different page: Is ChatGPT HIPAA compliant? (and Claude, Gemini, Copilot). This one is for the team building something.)

The ground rules, whatever you pick

Three constraints apply to every architecture below.

The BAA chain has to be unbroken. Every party that creates, receives, maintains, or transmits PHI on your behalf needs a signed BAA: the model provider if one exists, the cloud provider if PHI touches its infrastructure, and any vendor in between. One uncovered hop makes every request through it an impermissible disclosure.

The Security Rule applies to the whole path. Encryption in transit and at rest, role-based access control, and audit controls that record activity: who prompted what, when, and what came back. Your inference logs are now compliance artifacts. So are your retention settings for them.

Compliance stays shared. A BAA covers the vendor's side. Risk analysis, workforce training, minimum necessary policies, and configuration discipline stay on yours. AWS, Microsoft, and Google all state this explicitly; believe them.

Architecture 1: a cloud model endpoint under your cloud BAA

What it is. Frontier models served as a managed service inside your cloud account: Amazon Bedrock on AWS, Azure OpenAI under Microsoft's BAA, and Google's covered Gemini services under the Google Cloud BAA. Bedrock is on AWS's HIPAA-eligible list; Azure OpenAI is in scope through the Online Services DPA; Google publishes a covered-services list (check it by name, Google renames its AI products often).

Who holds the BAA. Your cloud provider, under the agreement you almost certainly already have. No new business associate enters your risk register.

What it requires operationally. Region pinning for data residency. Logging at the endpoint (CloudTrail and friends) wired into your audit story. Confirming the no-training defaults in your cloud agreement. IAM scoping so only the PHI-approved workload can invoke the endpoint. The honest caveat: prompts process on the cloud provider's managed service, so this is "inside your cloud boundary," not "on your silicon."

Choose it if you want frontier-quality models with the least new paperwork. For most healthcare teams this is the default answer.

Architecture 2: dedicated capacity or a vendor API with a BAA

What it is. Going to the model vendor directly: OpenAI's zero-retention API under its Healthcare Addendum, or Anthropic's HIPAA-ready first-party API. For throughput or isolation requirements, both clouds also sell dedicated capacity (provisioned throughput on Bedrock, provisioned deployments on Azure OpenAI) that keeps the same BAA story as architecture 1 with single-tenant performance.

Who holds the BAA. The model vendor, which means a new business associate and a vendor review.

What it requires operationally. Strict adherence to the covered configuration, which is narrower than the product. OpenAI's BAA covers only zero-retention-eligible endpoints on an approved org, with third-party GPTs and plugins excluded. Anthropic's covers a defined feature subset of the Messages API. Feature drift is the failure mode here: an engineer enables an excluded capability and your covered deployment quietly stops being covered. Pin the allowed surface in code review, not in a wiki.

Choose it if you need a vendor-specific capability that the cloud endpoints do not expose, and you are staffed to manage another business associate.

Architecture 3: open-weights models, self-hosted in your cloud

What it is. Serving open-weights models (Qwen, DeepSeek, Mistral, Llama and similar) yourself on GPU instances in your VPC, behind vLLM or similar. No model provider exists in the chain; nobody outside your organization can see a prompt.

Who holds the BAA. Only your cloud provider, because PHI still lives on its infrastructure. There is no model BAA because there is no model vendor.

What it requires operationally. Everything. You own the serving stack's Security Rule story end to end: TLS termination, authentication in front of the endpoint, audit logging of every request, patching the serving framework, and capacity planning. No-training guarantees become trivial (nobody is training on anything) but retention becomes your code: if your gateway logs prompts in plaintext, you built the leak yourself. Current open-weights models handle analyst-style work well when the system around them is structured, which is most of the argument in the private AI data analyst.

Choose it if policy forbids any model vendor from seeing PHI, you want token costs to flatten into a GPU bill, or you need deployment portability across clouds.

Architecture 4: on-prem GPUs

What it is. Open-weights models on hardware you own, in your data center. The air-gap tier.

Who holds the BAA. Nobody. This is the only architecture with no third party in the inference path at all, which makes it the only true "no BAA required" answer. (Worth repeating the inverse: any cloud variant of "self-hosted" still requires the cloud provider's BAA, because the infrastructure holding PHI is theirs.)

What it requires operationally. Architecture 3's burden plus hardware: procurement, racking, power, failover, and physical security, which the Security Rule also cares about. Model updates arrive on your schedule, which is a feature for change control and a tax on capability. Egress rules can go to zero.

Choose it if you are a hospital system, payer, or research org with an existing data-center practice and a policy that says PHI does not leave the building. Defense and public-sector teams end up here for the same structural reasons.

The decision table

Four architectures for running a HIPAA-compliant LLM deployment, compared
Cloud endpoint (Bedrock / Azure OpenAI / Google) Vendor API with BAA Open weights in your VPC On-prem GPUs
Who holds the BAA Your cloud provider The model vendor (new business associate) Your cloud provider only No one; no third party exists
Where prompts are processed Managed service in your cloud boundary Vendor's infrastructure Your VPC Your building
Model quality ceiling Frontier Frontier Strong, a step behind frontier Strong, a step behind frontier
Ops burden Low Low, plus vendor management High Highest
Cost shape Per token, on your cloud bill Per token, new invoice GPU instances, flat-ish Capex, then power
Works air-gapped No No No (private network, not air gap) Yes

The checklist every architecture still owes

Architecture picks who you trust. These six items exist no matter what you picked.

  1. BAAs signed before the first PHI request, with every third party in the path.
  2. No-training and retention terms in writing. Defaults are not commitments; get the clause.
  3. Audit logging of every prompt and response, retained per your policy, queryable when OCR asks.
  4. Access controls that enforce minimum necessary. The LLM sees what the calling user is allowed to see, not what the service account can see.
  5. Data residency you can state in one sentence, region by region.
  6. An incident path that includes the model endpoint. If the endpoint misbehaves, who gets paged and who gets notified?

An LLM endpoint is not an analytics stack

Here is the trap at the end of this project. You stand up a beautifully compliant model endpoint, and then someone asks the question that motivated the whole thing: "can it look at our patient data?" Now a database, a semantic layer, a BI tool, and an agent runtime all enter the PHI path, and most teams bolt those on as SaaS, reopening every boundary question the endpoint just closed.

The model is one component. A private AI data analyst is the whole system: agent, semantic layer, lakehouse, and BI running inside your environment, calling whichever of the four endpoints above you chose. That pairing is the actual unit of compliance, and it is the architecture on our private deployment page. Definite ships it as one stack: connectors, a DuckDB and DuckLake lakehouse, BI, and Fi, the AI analyst, self-hosted in your cloud or on-prem, with the model on your Bedrock, Azure OpenAI, or Google endpoint, or your own GPUs. We hold a SOC 2 Type II attestation (trust.definite.app) and sign HIPAA BAAs, including for Definite Cloud. The wider deployment story is in the self-hostable data stack, and the buyer's view of this whole space is in HIPAA-compliant AI tools.

FAQ

Is there a HIPAA-compliant LLM? Not off the shelf. No LLM is HIPAA compliant by itself, and no HIPAA certification exists for any software. What exists are HIPAA-compliant LLM deployments: a model served through an endpoint covered by a BAA (or on hardware you operate), wrapped in access controls, audit logging, and no-training guarantees, inside an organization doing its own HIPAA work.

Is Amazon Bedrock HIPAA eligible? Yes. Amazon Bedrock is on AWS's HIPAA-eligible services list, which means you can run PHI workloads through it once your AWS BAA is in place and the workload is configured per HIPAA requirements. Azure OpenAI is similarly in scope for Microsoft's BAA, and Google Cloud publishes its own covered-services list for Gemini.

Can I use an open-source LLM for HIPAA workloads? Yes, and it is the strongest data-control option. Serving an open-weights model (Qwen, DeepSeek, Mistral, Llama and similar) on infrastructure you operate means no model provider ever touches PHI, so there is no model BAA to sign. You take on the full Security Rule burden for the serving stack: encryption, access control, audit logging, and patching are all yours.

Can I run an LLM on PHI with no BAA at all? Only on hardware you own and operate, in your own facility. The moment PHI lives or is processed on cloud infrastructure, the cloud provider is a business associate and you need its BAA, even if the model weights are open and the serving stack is yours. On-prem GPUs are the single architecture with no BAA in the chain.

What does a HIPAA-compliant LLM deployment require operationally? Six things regardless of architecture: a signed BAA with every third party that touches PHI, no-training and retention guarantees in writing, audit logs of every prompt and response, access controls that enforce minimum necessary, data residency you can state precisely, and an incident response path that covers the model endpoint. Skip any one and the architecture choice does not save you.

If you are partway through this build and want to see the finished version, grab 30 minutes and I'll show you Fi running on a Bedrock endpoint inside a single tenant, audit logs included.

Your answer engine
is one afternoon away.

Book a 30-minute call and watch us build your first dashboard live, with your own data.