Locally hosted models, private cloud deployments, redaction pipelines, and zero-retention contracts. We design the right balance for regulated work and ship it end to end.
If your work involves data that should never sit on a vendor's servers, you're in the right place.
Privileged communications, contract review, discovery, deal data rooms.
Client books, deal pipelines, trade secrets, restricted research.
Patient records, clinical notes, lab results, prescription history.
Client returns, audit working papers, sensitive financial records.
R&D notes, source code, design specs, unfiled patent material.
Restricted info policies, listed-company disclosure controls, classified bid documents.
If your team has ever asked "can we even use AI for this?", this page is for you.
"Private AI" is not one thing. It's a spectrum. We pick the right pattern for each workload and often mix two of them in production.
Pattern 01
Open-source models on hardware you own. No internet route. LAN-only or fully air-gapped.
Court-discoverable matter, classified info, IP-heavy R&D, MAS / HSA regulated workloads.
Hardware capex is real. Open-source models trail the frontier by 6 to 12 months.
Pattern 02
Hyperscaler-hosted in a virtual private cloud. Azure OpenAI, AWS Bedrock private, or GCP Vertex.
Regulated firms that accept hyperscaler residency, BAA-eligible workloads, EU data residency needs.
You inherit the hyperscaler's terms. Still a hyperscaler.
Pattern 03
Local redactor strips PII before anything leaves your network. Cloud frontier model processes anonymised text. Output rehydrated locally.
Teams that want frontier capability on workloads where the meaning is sensitive but the identities are the actual PII.
Adds latency. Redactor model needs tuning per domain.
Pattern 04
Frontier models under enterprise agreements: no training on your data, no retention, audit logs, signed DPA, named accountability.
Regulated firms whose risk officer accepts a contractual stance over architectural separation.
Trust is contractual, not architectural. Some buyers will not accept this.
Most regulated clients end up with two of these stacked: a hybrid pipeline for everyday work, an on-prem fallback for the most sensitive matter. We design the right mix for you.
We work with open-source models, modern inference engines, and the enterprise platforms that take data residency and DPAs seriously.
We're vendor-agnostic. We will tell you when one of these is wrong for you.
Private AI is not free. Anyone who says otherwise is selling you something. Here are the four things every prospective client should hear.
A self-hosted Qwen 3.6 27B or DeepSeek V4 is excellent for summarisation, retrieval, structured extraction, and even agentic coding. It is not Claude Opus or GPT-5 for the very hardest reasoning. We will tell you the capability gap up front, not after the contract.
A serious on-prem rig is a five-figure capex plus power and maintenance. For some firms that's cheaper than three years of API spend. For others it isn't. We model both before we recommend either.
Pure on-prem feels safe but ships slower. Pure cloud is fastest but loses some buyers' trust. A hybrid redaction pipeline gives you frontier capability on the parts that need it and full local control on the parts that don't.
If your data isn't actually sensitive, a strong zero-retention contract may be cheaper and faster. We will not sell you a server you don't need. Saying no to over-engineering is part of our job.
Anonymised composites drawn from scoping calls. We can share named references under NDA after our first conversation.
Pattern 01 · On-prem
Air-gapped Qwen 3.6 27B running on a single GPU server in the firm's office. Junior associates draft contract summaries from a curated precedent library. Nothing leaves the network. Senior partner reviews on the same machine. Discovery-friendly audit logs.
Pattern 03 · Hybrid redaction
Local redactor strips client names, account numbers, and identifiers. Anonymised text goes to a frontier model for analysis and drafting. Output is rehydrated locally before the advisor sees it. Frontier reasoning quality with no PII leaving the network.
Pattern 02 · Private cloud
Clinical notes processed inside an Azure tenant under a Business Associate Agreement. Patient identifiers never leave the tenant. Doctors get structured summaries written back into the EMR. Audit trail satisfies the compliance officer.
All scenarios anonymised. Real references shared on call.
A compact map. The right answer depends on your regulator, your DPO, and your risk appetite. We bring the architecture options, you bring the legal context.
| Concern | On-prem | Private cloud | Hybrid | Zero-retention |
|---|---|---|---|---|
| PDPA (Singapore) residency | Strongest | Region-locked | Strong | Vendor-dependent |
| GDPR (EU) residency | Strongest | EU region only | Strong | EU-eligible vendors only |
| Attorney-client privilege | Strongest | Strong | Strong | Contractual only |
| Sector overlays (MAS, HSA, HIPAA) | Best fit | Case-by-case | Case-by-case | Contractual only |
| Audit trail | Yours to build | Platform-native | Mixed | Vendor logs |
We are not your lawyers or your DPO. We work alongside your counsel and compliance team. We can sign mutual NDAs and DPAs before engagement.
We move at the pace your compliance team can stomach. Most engagements ship a pilot in 2 to 4 weeks and reach production in 8 to 12.
30 minutes. NDA signed before the call if you need one. We listen, you tell us what's sensitive and what isn't.
We walk through your workflows and label what's actually sensitive vs what's been treated that way out of habit. You'd be surprised how much of "all of it" is really "the parts with names attached".
One of the four patterns, or a mix. Written. Includes the model, the runtime, the hardware or hyperscaler choice, the integration approach, and the costs at three different volumes.
Two to four weeks. One workflow, fully built. Measured against an eval set we agree on up front. You see what real performance looks like before any big commitments.
Documentation, ops runbook, training for your team. You can run independently from day one, or keep us on a light retainer for model updates and tuning.
If you have a ninth, the scoping call is the place. We have not yet heard a sensitive-data question that surprised us.
Yes. A fully on-premise deployment runs entirely on hardware you own. Models, inference engine, vector store, and the application all sit on your network.
We can configure it as fully air-gapped (no internet route at all) or LAN-only (no outbound but reachable from inside your network). Updates and model swaps are done by physically loading new files - we'll show your IT team how during handover.
For summarisation, structured extraction, retrieval, classification, agentic coding, and most business workflows: yes. The current open-weights flagship is Qwen 3.6 27B (dense, Apache 2.0, released April 2026), which outperforms Qwen 3.5's 397B MoE on every major coding benchmark. For long-context work, DeepSeek V4 Pro (MIT, 1M tokens) is the pick. Llama 4 Scout stretches to a 10M token context.
For the very hardest frontier reasoning, agentic chains over many tools, or work that needs the very best instruction following, open-source still trails Claude Opus and GPT-5 by a margin. We will tell you the gap before you commit, and where it matters most for your specific workload.
This is exactly the gap the hybrid redaction pipeline (Pattern 03) is designed to bridge: keep the PII local, use the frontier model on the rest.
It varies a lot. A modest single-GPU rig for a small team starts around USD 8,000 to 15,000 in hardware plus our setup fee. A serious 70B-class production rig with redundancy is a five-figure capex range plus power and maintenance.
Private cloud is opex monthly with a smaller setup fee. Hybrid redaction is somewhere in between. Zero-retention contracts are pure opex.
We will cost a configuration to your workload during scoping, not before. The honest answer to "how much" is "it depends what you're doing" - and the call is the right place to find out.
You do. The hardware is yours, the model weights are yours (the open-source licences we use permit commercial deployment), the configuration is yours, and your data never belonged to us in the first place.
We document everything for handover. You can run independently from day one, or keep us on a light retainer for tuning and updates. There's no lock-in.
Open-source models are shipping roughly every 2 to 4 months at the moment. Qwen released 3.5 in February 2026 and 3.6 in April 2026. DeepSeek shipped V4 Pro earlier this year. Llama 4 landed last year. The pace is faster than most internal IT roadmaps.
We design deployments so model swaps are a configuration change, not a rebuild. Your prompts, knowledge base, tool integrations, and access control stay the same. We swap in the new model, re-run your eval set, and only promote if the new model is measurably better on your workloads. If it isn't, we don't change anything.
Yes. We use a mutual NDA. Tick the box on the form below to flag this and we'll send a template across the same day.
If your firm has its own NDA, send it to us instead and we'll review and sign promptly. We don't drag this out.
Yes. We expect to. Most regulated firms have an IT lead, MSP, or DPO who needs to be in the loop from day one.
We bring the AI deployment expertise: model selection, prompt engineering, evaluation, runtime tuning. Your team owns the network, identity, and procurement. We document for handover, train your team, and stay engaged for ongoing tuning if you want.
Yes. We work alongside your counsel and compliance team. We will not pretend to be your lawyer.
We will tell you which deployment pattern maps best to your specific overlay, what the residual risks are, and where you should ask your DPO or general counsel to sign off. We sign DPAs and BAAs where applicable, and we expect your compliance team to want to see them.
The compliance crosswalk above is a starting map, not legal advice.
Six fields and a few honest checkboxes. We reply within one business day. If you tick "NDA first", we send the template before we say anything else.
If you ticked "NDA first", we'll send a mutual NDA template within one business day before any other conversation. Otherwise we'll reply with a couple of times for the scoping call.
We are not your lawyers. We work alongside your counsel.
We can sign NDAs and DPAs before any engagement.
Vendor-agnostic. We'll recommend against private AI when it's not needed.