Private AI for sensitive data. Your data stays where you want it.

Locally hosted models, private cloud deployments, redaction pipelines, and zero-retention contracts. We design the right balance for regulated work and ship it end to end.

NDA-first scoping PDPA & GDPR aware Vendor-agnostic

Teams that have asked, "can we even use AI for this?"

If your work involves data that should never sit on a vendor's servers, you're in the right place.

Law firms

Privileged communications, contract review, discovery, deal data rooms.

Financial services

Client books, deal pipelines, trade secrets, restricted research.

Healthcare and clinics

Patient records, clinical notes, lab results, prescription history.

Accountancy and tax

Client returns, audit working papers, sensitive financial records.

IP-heavy SMEs

R&D notes, source code, design specs, unfiled patent material.

Government suppliers

Restricted info policies, listed-company disclosure controls, classified bid documents.

If your team has ever asked "can we even use AI for this?", this page is for you.

Four ways to keep your data where it belongs.

"Private AI" is not one thing. It's a spectrum. We pick the right pattern for each workload and often mix two of them in production.

Diagram of fully on-premise deployment: a glowing server housed inside a building outline. AI infrastructure sits entirely within the client's own walls with no external network route. Pattern 01

Fully on-premise / air-gapped

Open-source models on hardware you own. No internet route. LAN-only or fully air-gapped.

Your data → Your server → Your users
Best for

Court-discoverable matter, classified info, IP-heavy R&D, MAS / HSA regulated workloads.

Tradeoff

Hardware capex is real. Open-source models trail the frontier by 6 to 12 months.

One-off hardware + setup · low monthly
Diagram of private cloud deployment: a database connects to a server and a user group inside a virtual private cloud perimeter. Hyperscaler-hosted AI inside a tenant the client controls. Pattern 02

Private cloud (VPC)

Hyperscaler-hosted in a virtual private cloud. Azure OpenAI, AWS Bedrock private, or GCP Vertex.

Your data → Your tenant in Azure / AWS → Your users
Best for

Regulated firms that accept hyperscaler residency, BAA-eligible workloads, EU data residency needs.

Tradeoff

You inherit the hyperscaler's terms. Still a hyperscaler.

Monthly consumption + setup
Diagram of hybrid redaction pipeline: raw data flows from a database through a local redactor into anonymised text, out to a cloud LLM, and back through a rehydration step to a user group. The local-control half sits inside a perimeter, only the redacted text leaves the network. Pattern 03

Hybrid redaction pipeline

Local redactor strips PII before anything leaves your network. Cloud frontier model processes anonymised text. Output rehydrated locally.

Raw data → Local redactor → Anonymised → Cloud LLM → Local rehydration → Your users
Best for

Teams that want frontier capability on workloads where the meaning is sensitive but the identities are the actual PII.

Tradeoff

Adds latency. Redactor model needs tuning per domain.

One-off setup + light monthly
Diagram of zero-retention enterprise contracts: raw data flows from a database, through a signed contract, into a frontier vendor's server. The trust boundary is contractual rather than architectural. Pattern 04

Zero-retention enterprise contracts

Frontier models under enterprise agreements: no training on your data, no retention, audit logs, signed DPA, named accountability.

Your data → Frontier vendor under enterprise contract → Your users
Best for

Regulated firms whose risk officer accepts a contractual stance over architectural separation.

Tradeoff

Trust is contractual, not architectural. Some buyers will not accept this.

Monthly consumption

Most regulated clients end up with two of these stacked: a hybrid pipeline for everyday work, an on-prem fallback for the most sensitive matter. We design the right mix for you.

Concrete proof, not vague vendor speak.

We work with open-source models, modern inference engines, and the enterprise platforms that take data residency and DPAs seriously.

Open-source models

Open-weights, commercial-friendly

  • Qwen 3.6 (27B dense / 35B MoE, Apache 2.0)
  • DeepSeek V4 Pro (MIT, 1M context)
  • Llama 4 (Scout / Maverick)
  • Mistral Large 3
  • Gemma 4 26B (small, on-device)
  • Domain-specific fine-tunes
Inference engines

Runtimes we trust in production

  • vLLM (high-throughput servers)
  • Ollama (developer-friendly)
  • llama.cpp (CPU + small-GPU)
  • LM Studio (desktop)
  • TGI by Hugging Face
  • Custom Triton when needed
Enterprise platforms

When private cloud is the right call

  • Azure OpenAI Service
  • AWS Bedrock (private)
  • GCP Vertex AI
  • Anthropic Enterprise
  • OpenAI Enterprise
  • Sovereign / regional clouds

We're vendor-agnostic. We will tell you when one of these is wrong for you.

What we tell you before you spend a dollar.

Private AI is not free. Anyone who says otherwise is selling you something. Here are the four things every prospective client should hear.

Open-source models are good. Not frontier-good.

A self-hosted Qwen 3.6 27B or DeepSeek V4 is excellent for summarisation, retrieval, structured extraction, and even agentic coding. It is not Claude Opus or GPT-5 for the very hardest reasoning. We will tell you the capability gap up front, not after the contract.

Hardware costs are real.

A serious on-prem rig is a five-figure capex plus power and maintenance. For some firms that's cheaper than three years of API spend. For others it isn't. We model both before we recommend either.

Hybrid is usually the right answer.

Pure on-prem feels safe but ships slower. Pure cloud is fastest but loses some buyers' trust. A hybrid redaction pipeline gives you frontier capability on the parts that need it and full local control on the parts that don't.

We will tell you when private AI is overkill.

If your data isn't actually sensitive, a strong zero-retention contract may be cheaper and faster. We will not sell you a server you don't need. Saying no to over-engineering is part of our job.

Three real shapes of private AI work.

Anonymised composites drawn from scoping calls. We can share named references under NDA after our first conversation.

Vignette of a mid-sized law firm running an air-gapped private AI deployment. Source documents feed into a server contained inside the firm's own premises, with no external network route. Junior associates and a senior partner work from the same private machine. Pattern 01 · On-prem

Mid-sized law firm, contract review

Air-gapped Qwen 3.6 27B running on a single GPU server in the firm's office. Junior associates draft contract summaries from a curated precedent library. Nothing leaves the network. Senior partner reviews on the same machine. Discovery-friendly audit logs.

Vignette of a wealth advisory firm running a hybrid redaction pipeline. Client names and account numbers are stripped locally before anonymised text reaches a frontier model in the cloud. The output is rehydrated back inside the firm's network before an advisor sees it. Pattern 03 · Hybrid redaction

Wealth advisory, deal memos

Local redactor strips client names, account numbers, and identifiers. Anonymised text goes to a frontier model for analysis and drafting. Output is rehydrated locally before the advisor sees it. Frontier reasoning quality with no PII leaving the network.

Vignette of a specialty clinic running clinical-notes AI inside a private Azure tenant under a Business Associate Agreement. A patient record on the left flows through a private cloud server in the middle and is returned as a structured clinical summary on the right, with the whole pipeline contained inside the clinic's tenant boundary. Pattern 02 · Private cloud

Specialty clinic, clinical notes

Clinical notes processed inside an Azure tenant under a Business Associate Agreement. Patient identifiers never leave the tenant. Doctors get structured summaries written back into the EMR. Audit trail satisfies the compliance officer.

All scenarios anonymised. Real references shared on call.

How each pattern maps to the rules you live under.

A compact map. The right answer depends on your regulator, your DPO, and your risk appetite. We bring the architecture options, you bring the legal context.

Concern On-prem Private cloud Hybrid Zero-retention
PDPA (Singapore) residency Strongest Region-locked Strong Vendor-dependent
GDPR (EU) residency Strongest EU region only Strong EU-eligible vendors only
Attorney-client privilege Strongest Strong Strong Contractual only
Sector overlays (MAS, HSA, HIPAA) Best fit Case-by-case Case-by-case Contractual only
Audit trail Yours to build Platform-native Mixed Vendor logs

We are not your lawyers or your DPO. We work alongside your counsel and compliance team. We can sign mutual NDAs and DPAs before engagement.

Five steps, no surprises.

We move at the pace your compliance team can stomach. Most engagements ship a pilot in 2 to 4 weeks and reach production in 8 to 12.

Diagram of the five-step engagement process: confidential scoping call, sensitivity mapping, architecture recommendation, pilot deployment, and production handover. Each step shown as a labelled circle along a horizontal timeline.
01

Confidential scoping call

30 minutes. NDA signed before the call if you need one. We listen, you tell us what's sensitive and what isn't.

02

Sensitivity mapping

We walk through your workflows and label what's actually sensitive vs what's been treated that way out of habit. You'd be surprised how much of "all of it" is really "the parts with names attached".

03

Architecture recommendation

One of the four patterns, or a mix. Written. Includes the model, the runtime, the hardware or hyperscaler choice, the integration approach, and the costs at three different volumes.

04

Pilot deployment

Two to four weeks. One workflow, fully built. Measured against an eval set we agree on up front. You see what real performance looks like before any big commitments.

05

Production + handover

Documentation, ops runbook, training for your team. You can run independently from day one, or keep us on a light retainer for model updates and tuning.

Eight questions we get every time.

If you have a ninth, the scoping call is the place. We have not yet heard a sensitive-data question that surprised us.

Yes. A fully on-premise deployment runs entirely on hardware you own. Models, inference engine, vector store, and the application all sit on your network.

We can configure it as fully air-gapped (no internet route at all) or LAN-only (no outbound but reachable from inside your network). Updates and model swaps are done by physically loading new files - we'll show your IT team how during handover.

For summarisation, structured extraction, retrieval, classification, agentic coding, and most business workflows: yes. The current open-weights flagship is Qwen 3.6 27B (dense, Apache 2.0, released April 2026), which outperforms Qwen 3.5's 397B MoE on every major coding benchmark. For long-context work, DeepSeek V4 Pro (MIT, 1M tokens) is the pick. Llama 4 Scout stretches to a 10M token context.

For the very hardest frontier reasoning, agentic chains over many tools, or work that needs the very best instruction following, open-source still trails Claude Opus and GPT-5 by a margin. We will tell you the gap before you commit, and where it matters most for your specific workload.

This is exactly the gap the hybrid redaction pipeline (Pattern 03) is designed to bridge: keep the PII local, use the frontier model on the rest.

It varies a lot. A modest single-GPU rig for a small team starts around USD 8,000 to 15,000 in hardware plus our setup fee. A serious 70B-class production rig with redundancy is a five-figure capex range plus power and maintenance.

Private cloud is opex monthly with a smaller setup fee. Hybrid redaction is somewhere in between. Zero-retention contracts are pure opex.

We will cost a configuration to your workload during scoping, not before. The honest answer to "how much" is "it depends what you're doing" - and the call is the right place to find out.

You do. The hardware is yours, the model weights are yours (the open-source licences we use permit commercial deployment), the configuration is yours, and your data never belonged to us in the first place.

We document everything for handover. You can run independently from day one, or keep us on a light retainer for tuning and updates. There's no lock-in.

Open-source models are shipping roughly every 2 to 4 months at the moment. Qwen released 3.5 in February 2026 and 3.6 in April 2026. DeepSeek shipped V4 Pro earlier this year. Llama 4 landed last year. The pace is faster than most internal IT roadmaps.

We design deployments so model swaps are a configuration change, not a rebuild. Your prompts, knowledge base, tool integrations, and access control stay the same. We swap in the new model, re-run your eval set, and only promote if the new model is measurably better on your workloads. If it isn't, we don't change anything.

Yes. We use a mutual NDA. Tick the box on the form below to flag this and we'll send a template across the same day.

If your firm has its own NDA, send it to us instead and we'll review and sign promptly. We don't drag this out.

Yes. We expect to. Most regulated firms have an IT lead, MSP, or DPO who needs to be in the loop from day one.

We bring the AI deployment expertise: model selection, prompt engineering, evaluation, runtime tuning. Your team owns the network, identity, and procurement. We document for handover, train your team, and stay engaged for ongoing tuning if you want.

Yes. We work alongside your counsel and compliance team. We will not pretend to be your lawyer.

We will tell you which deployment pattern maps best to your specific overlay, what the residual risks are, and where you should ask your DPO or general counsel to sign off. We sign DPAs and BAAs where applicable, and we expect your compliance team to want to see them.

The compliance crosswalk above is a starting map, not legal advice.

Confidential scoping call

Tell us a little. We'll send the NDA first.

Six fields and a few honest checkboxes. We reply within one business day. If you tick "NDA first", we send the template before we say anything else.

Up to 500 characters. Plain language is fine. We'll dig into specifics on the call.

A few honest checkboxes

Your details are not used for marketing. We reply within one business day to the email you provided.

We are not your lawyers. We work alongside your counsel.

We can sign NDAs and DPAs before any engagement.

Vendor-agnostic. We'll recommend against private AI when it's not needed.