Are open-source models good enough?

For summarisation, structured extraction, retrieval, classification, agentic coding, and most business workflows: yes. As of May 2026, Qwen 3.6 27B (dense, Apache 2.0) is the current open-weights flagship for agentic coding, outperforming Qwen 3.5's 397B MoE on every major benchmark. DeepSeek V4 Pro leads on long-context reasoning with 1M tokens and MIT licensing. Llama 4 Scout has a 10M token context window. For most regulated workloads these are good enough. For the very hardest frontier reasoning, open-source still trails Claude Opus and GPT-5 by a margin. We will tell you the gap honestly before you commit.

How much does on-premise AI cost?

It varies a lot. A modest single-GPU rig for a small team starts around USD 8,000 to 15,000 in hardware plus our setup fee. A serious 70B-class production rig with redundancy is in the 5-figure capex range plus power and maintenance. Private cloud is opex monthly with a smaller setup fee. We will cost a configuration to your workload during scoping, not before.

Who owns the model and the deployment?

You do. The hardware is yours, the model weights are yours (open-source licences permit commercial use), the configuration is yours, and the data never belonged to us in the first place. We document everything for handover. You can run independently or keep us on a light retainer for tuning and updates.

Will you sign an NDA before we talk?

Yes. We use a mutual NDA. Tick the NDA-first checkbox on the scoping form and we will send a template before any other conversation. If you have your own NDA, attach it in your reply and we will review and sign promptly.

Can you work with our existing IT or MSP team?

Yes. We expect to. Most regulated firms have an IT lead, MSP, or DPO who needs to be in the loop. We bring the AI deployment expertise, your team owns the network, identity, and procurement. We can scope, deploy, document, and hand over to your team, or stay engaged for ongoing tuning.

Private AI for sensitive data

Who this is for

Teams that have asked, "can we even use AI for this?"

If your work involves data that should never sit on a vendor's servers, you're in the right place.

Law firms

Privileged communications, contract review, discovery, deal data rooms.

Financial services

Client books, deal pipelines, trade secrets, restricted research.

Healthcare and clinics

Patient records, clinical notes, lab results, prescription history.

Accountancy and tax

Client returns, audit working papers, sensitive financial records.

IP-heavy SMEs

R&D notes, source code, design specs, unfiled patent material.

Government suppliers

Restricted info policies, listed-company disclosure controls, classified bid documents.

If your team has ever asked "can we even use AI for this?", this page is for you.

The Private AI spectrum

Four ways to keep your data where it belongs.

"Private AI" is not one thing. It's a spectrum. We pick the right pattern for each workload and often mix two of them in production.

Diagram of fully on-premise deployment: a glowing server housed inside a building outline. AI infrastructure sits entirely within the client's own walls with no external network route.

Pattern 01

Fully on-premise / air-gapped

Open-source models on hardware you own. No internet route. LAN-only or fully air-gapped.

Your data → Your server → Your users

Best for

Court-discoverable matter, classified info, IP-heavy R&D, MAS / HSA regulated workloads.

Tradeoff

Hardware capex is real. Open-source models trail the frontier by 6 to 12 months.

One-off hardware + setup · low monthly

Diagram of private cloud deployment: a database connects to a server and a user group inside a virtual private cloud perimeter. Hyperscaler-hosted AI inside a tenant the client controls.

Pattern 02

Private cloud (VPC)

Hyperscaler-hosted in a virtual private cloud. Azure OpenAI, AWS Bedrock private, or GCP Vertex.

Your data → Your tenant in Azure / AWS → Your users

Best for

Regulated firms that accept hyperscaler residency, BAA-eligible workloads, EU data residency needs.

Tradeoff

You inherit the hyperscaler's terms. Still a hyperscaler.

Monthly consumption + setup

Pattern 03

Hybrid redaction pipeline

Local redactor strips PII before anything leaves your network. Cloud frontier model processes anonymised text. Output rehydrated locally.

Raw data → Local redactor → Anonymised → Cloud LLM → Local rehydration → Your users

Best for

Teams that want frontier capability on workloads where the meaning is sensitive but the identities are the actual PII.

Tradeoff

Adds latency. Redactor model needs tuning per domain.

One-off setup + light monthly

Pattern 04

Zero-retention enterprise contracts

Frontier models under enterprise agreements: no training on your data, no retention, audit logs, signed DPA, named accountability.

Your data → Frontier vendor under enterprise contract → Your users

Best for

Regulated firms whose risk officer accepts a contractual stance over architectural separation.

Tradeoff

Trust is contractual, not architectural. Some buyers will not accept this.

Monthly consumption

Most regulated clients end up with two of these stacked: a hybrid pipeline for everyday work, an on-prem fallback for the most sensitive matter. We design the right mix for you.

What we actually deploy

Concrete proof, not vague vendor speak.

We work with open-source models, modern inference engines, and the enterprise platforms that take data residency and DPAs seriously.

Open-source models

Open-weights, commercial-friendly

Qwen 3.6 (27B dense / 35B MoE, Apache 2.0)
DeepSeek V4 Pro (MIT, 1M context)
Llama 4 (Scout / Maverick)
Mistral Large 3
Gemma 4 26B (small, on-device)
Domain-specific fine-tunes

Inference engines

Runtimes we trust in production

vLLM (high-throughput servers)
Ollama (developer-friendly)
llama.cpp (CPU + small-GPU)
LM Studio (desktop)
TGI by Hugging Face
Custom Triton when needed

Enterprise platforms

When private cloud is the right call

Azure OpenAI Service
AWS Bedrock (private)
GCP Vertex AI
Anthropic Enterprise
OpenAI Enterprise
Sovereign / regional clouds

We're vendor-agnostic. We will tell you when one of these is wrong for you.

Honest tradeoffs

What we tell you before you spend a dollar.

Private AI is not free. Anyone who says otherwise is selling you something. Here are the four things every prospective client should hear.

Open-source models are good. Not frontier-good.

A self-hosted Qwen 3.6 27B or DeepSeek V4 is excellent for summarisation, retrieval, structured extraction, and even agentic coding. It is not Claude Opus or GPT-5 for the very hardest reasoning. We will tell you the capability gap up front, not after the contract.

Hardware costs are real.

A serious on-prem rig is a five-figure capex plus power and maintenance. For some firms that's cheaper than three years of API spend. For others it isn't. We model both before we recommend either.

Hybrid is usually the right answer.

Pure on-prem feels safe but ships slower. Pure cloud is fastest but loses some buyers' trust. A hybrid redaction pipeline gives you frontier capability on the parts that need it and full local control on the parts that don't.

We will tell you when private AI is overkill.

If your data isn't actually sensitive, a strong zero-retention contract may be cheaper and faster. We will not sell you a server you don't need. Saying no to over-engineering is part of our job.

Use case vignettes

Three real shapes of private AI work.

Anonymised composites drawn from scoping calls. We can share named references under NDA after our first conversation.

Vignette of a mid-sized law firm running an air-gapped private AI deployment. Source documents feed into a server contained inside the firm's own premises, with no external network route. Junior associates and a senior partner work from the same private machine.

Pattern 01 · On-prem

Mid-sized law firm, contract review

Air-gapped Qwen 3.6 27B running on a single GPU server in the firm's office. Junior associates draft contract summaries from a curated precedent library. Nothing leaves the network. Senior partner reviews on the same machine. Discovery-friendly audit logs.

Vignette of a wealth advisory firm running a hybrid redaction pipeline. Client names and account numbers are stripped locally before anonymised text reaches a frontier model in the cloud. The output is rehydrated back inside the firm's network before an advisor sees it.

Pattern 03 · Hybrid redaction

Wealth advisory, deal memos

Local redactor strips client names, account numbers, and identifiers. Anonymised text goes to a frontier model for analysis and drafting. Output is rehydrated locally before the advisor sees it. Frontier reasoning quality with no PII leaving the network.

Vignette of a specialty clinic running clinical-notes AI inside a private Azure tenant under a Business Associate Agreement. A patient record on the left flows through a private cloud server in the middle and is returned as a structured clinical summary on the right, with the whole pipeline contained inside the clinic's tenant boundary.

Pattern 02 · Private cloud

Specialty clinic, clinical notes

Clinical notes processed inside an Azure tenant under a Business Associate Agreement. Patient identifiers never leave the tenant. Doctors get structured summaries written back into the EMR. Audit trail satisfies the compliance officer.

All scenarios anonymised. Real references shared on call.

Compliance crosswalk

How each pattern maps to the rules you live under.

A compact map. The right answer depends on your regulator, your DPO, and your risk appetite. We bring the architecture options, you bring the legal context.

Concern	On-prem	Private cloud	Hybrid	Zero-retention
PDPA (Singapore) residency	Strongest	Region-locked	Strong	Vendor-dependent
GDPR (EU) residency	Strongest	EU region only	Strong	EU-eligible vendors only
Attorney-client privilege	Strongest	Strong	Strong	Contractual only
Sector overlays (MAS, HSA, HIPAA)	Best fit	Case-by-case	Case-by-case	Contractual only
Audit trail	Yours to build	Platform-native	Mixed	Vendor logs

We are not your lawyers or your DPO. We work alongside your counsel and compliance team. We can sign mutual NDAs and DPAs before engagement.

How an engagement works

Five steps, no surprises.

We move at the pace your compliance team can stomach. Most engagements ship a pilot in 2 to 4 weeks and reach production in 8 to 12.

01

Confidential scoping call

30 minutes. NDA signed before the call if you need one. We listen, you tell us what's sensitive and what isn't.

02

Sensitivity mapping

We walk through your workflows and label what's actually sensitive vs what's been treated that way out of habit. You'd be surprised how much of "all of it" is really "the parts with names attached".

03

Architecture recommendation

One of the four patterns, or a mix. Written. Includes the model, the runtime, the hardware or hyperscaler choice, the integration approach, and the costs at three different volumes.

04

Pilot deployment

Two to four weeks. One workflow, fully built. Measured against an eval set we agree on up front. You see what real performance looks like before any big commitments.

05

Production + handover

Documentation, ops runbook, training for your team. You can run independently from day one, or keep us on a light retainer for model updates and tuning.

FAQ

Eight questions we get every time.

If you have a ninth, the scoping call is the place. We have not yet heard a sensitive-data question that surprised us.

Yes. A fully on-premise deployment runs entirely on hardware you own. Models, inference engine, vector store, and the application all sit on your network.

We can configure it as fully air-gapped (no internet route at all) or LAN-only (no outbound but reachable from inside your network). Updates and model swaps are done by physically loading new files - we'll show your IT team how during handover.

For summarisation, structured extraction, retrieval, classification, agentic coding, and most business workflows: yes. The current open-weights flagship is Qwen 3.6 27B (dense, Apache 2.0, released April 2026), which outperforms Qwen 3.5's 397B MoE on every major coding benchmark. For long-context work, DeepSeek V4 Pro (MIT, 1M tokens) is the pick. Llama 4 Scout stretches to a 10M token context.

For the very hardest frontier reasoning, agentic chains over many tools, or work that needs the very best instruction following, open-source still trails Claude Opus and GPT-5 by a margin. We will tell you the gap before you commit, and where it matters most for your specific workload.

This is exactly the gap the hybrid redaction pipeline (Pattern 03) is designed to bridge: keep the PII local, use the frontier model on the rest.

It varies a lot. A modest single-GPU rig for a small team starts around USD 8,000 to 15,000 in hardware plus our setup fee. A serious 70B-class production rig with redundancy is a five-figure capex range plus power and maintenance.

Private cloud is opex monthly with a smaller setup fee. Hybrid redaction is somewhere in between. Zero-retention contracts are pure opex.

We will cost a configuration to your workload during scoping, not before. The honest answer to "how much" is "it depends what you're doing" - and the call is the right place to find out.

You do. The hardware is yours, the model weights are yours (the open-source licences we use permit commercial deployment), the configuration is yours, and your data never belonged to us in the first place.

We document everything for handover. You can run independently from day one, or keep us on a light retainer for tuning and updates. There's no lock-in.

Open-source models are shipping roughly every 2 to 4 months at the moment. Qwen released 3.5 in February 2026 and 3.6 in April 2026. DeepSeek shipped V4 Pro earlier this year. Llama 4 landed last year. The pace is faster than most internal IT roadmaps.

We design deployments so model swaps are a configuration change, not a rebuild. Your prompts, knowledge base, tool integrations, and access control stay the same. We swap in the new model, re-run your eval set, and only promote if the new model is measurably better on your workloads. If it isn't, we don't change anything.

Yes. We use a mutual NDA. Tick the box on the form below to flag this and we'll send a template across the same day.

If your firm has its own NDA, send it to us instead and we'll review and sign promptly. We don't drag this out.

Yes. We expect to. Most regulated firms have an IT lead, MSP, or DPO who needs to be in the loop from day one.

We bring the AI deployment expertise: model selection, prompt engineering, evaluation, runtime tuning. Your team owns the network, identity, and procurement. We document for handover, train your team, and stay engaged for ongoing tuning if you want.

Yes. We work alongside your counsel and compliance team. We will not pretend to be your lawyer.

We will tell you which deployment pattern maps best to your specific overlay, what the residual risks are, and where you should ask your DPO or general counsel to sign off. We sign DPAs and BAAs where applicable, and we expect your compliance team to want to see them.

The compliance crosswalk above is a starting map, not legal advice.

Confidential scoping call

Tell us a little. We'll send the NDA first.

Six fields and a few honest checkboxes. We reply within one business day. If you tick "NDA first", we send the template before we say anything else.

Full name

Work email

Firm / company

Sector

Region or country (optional)

Brief: what would you like AI to do? (optional)

Up to 500 characters. Plain language is fine. We'll dig into specifics on the call.

A few honest checkboxes I'd like to sign a mutual NDA before our scoping call. We are a regulated entity (PDPA, MAS, HSA, GDPR, HIPAA, or similar). Please treat with care. We have an existing IT or MSP team we'd want involved. We're evaluating against another AI vendor or considering a build-it-yourself approach. I'm authorised to discuss this on behalf of my organisation. (required)

Your details are not used for marketing. We reply within one business day to the email you provided.

Private AI for sensitive data. Your data stays where you want it.

Teams that have asked, "can we even use AI for this?"

Law firms

Financial services

Healthcare and clinics

Accountancy and tax

IP-heavy SMEs

Government suppliers

Four ways to keep your data where it belongs.

Fully on-premise / air-gapped

Private cloud (VPC)

Hybrid redaction pipeline

Zero-retention enterprise contracts

Concrete proof, not vague vendor speak.

Open-weights, commercial-friendly

Runtimes we trust in production

When private cloud is the right call

What we tell you before you spend a dollar.

Open-source models are good. Not frontier-good.

Hardware costs are real.

Hybrid is usually the right answer.

We will tell you when private AI is overkill.

Three real shapes of private AI work.

Mid-sized law firm, contract review

Wealth advisory, deal memos

Specialty clinic, clinical notes

How each pattern maps to the rules you live under.

Five steps, no surprises.

Confidential scoping call

Sensitivity mapping

Architecture recommendation

Pilot deployment

Production + handover

Eight questions we get every time.

Tell us a little. We'll send the NDA first.

We've got it. NDA on its way.