Public sector

AI in regulated sectors — what NHS, Police and defence actually need

A practical look at how AI gets adopted in regulated UK environments. Data residency, DSPT, SC clearance, supplier assurance — and why the off-the-shelf large model isn't always the answer.

There’s a curious double-bind that public-sector organisations sit in around AI. The senior leadership tier is under pressure — political, operational, financial — to “do AI”. Meanwhile, the information governance team is, quite rightly, refusing to let any data leave the perimeter. The result is a stuck conversation. The leadership wants productivity gains. The IG team wants UK data residency and a clean DPIA. The vendor presents something that looks like neither.

This is solvable. The path through it requires understanding three things: where the actual risk lives, what the regulators actually require, and which architectures get to yes.

Where the risk lives

In a regulated environment, AI risk is rarely about model accuracy. The risk is about data flow. Specifically:

Where does the input data go? When a clinician asks an AI summarisation tool to compress a discharge note, the tool either processes the note locally, sends it to a UK-resident hosted service with a Data Processing Agreement, or sends it to a US-hosted SaaS that may or may not have a UK data centre and may or may not let the input flow into model training. The legal posture of those three options is wildly different.

What happens to the output? Outputs need to be auditable, reversible, and attributable. If an AI suggests an action and a human acts on it, who carries the clinical or operational responsibility? The answer should be in writing before the system is live, not invented after an incident.

How does the human stay in the loop? Human-in-the-loop is the phrase regulators use for systems where AI suggests and humans decide. It’s currently the only AI architecture that holds up cleanly under most public-sector accountability frameworks. Human-on-the-loop (the AI acts and a human reviews) is acceptable for narrow, low-risk tasks. No human in the loop should not exist in a regulated environment in 2026.

The rest follows from these three. If the input data flow is wrong, no amount of model accuracy will save you. If the output isn’t auditable, you can’t investigate when something goes sideways. If the human isn’t actually empowered to override, the whole system is just an automated mistake.

What the regulators actually require

Let’s be specific about the UK frameworks that matter for AI projects in 2026.

Data Security and Protection Toolkit (DSPT) v8. Mandatory for any organisation handling NHS data. Version 8 substantially tightened the requirements around AI and automated decision-making — specifically, organisations now need to evidence that they’ve done a DPIA for any AI system that processes patient data, that the system has clearly documented retention and deletion behaviour, and that there’s a human escalation path. “We use ChatGPT” without that evidence will fail an assessment.

Cyber Assessment Framework (CAF) v3.4. The NCSC framework underpinning the public-sector cyber posture. The relevant principles for AI projects are A.4 (Supply Chain — your AI vendor is now part of your supply chain and inherits your obligations), B.2 (Identity and Access Management — agentic AI tools need bounded permissions), and D.1 (Response and Recovery — you need a plan for what happens when the AI does something wrong).

UK GDPR + Data Protection Act 2018. The well-known one. The relevant articles for AI are 22 (right not to be subject to solely automated decisions, including profiling) and the lawful-basis selection — most public-sector AI use sits under public task or vital interest, not consent. Choosing the wrong basis is a common DPIA failure point.

MOD JSP 440 / 604 / 700-series. If the organisation is defence-adjacent, these supplant or layer on top of the above. The classification of the data being processed dictates a great deal — most genuinely useful AI projects in defence end up at OFFICIAL or OFFICIAL-SENSITIVE, where on-premise inference becomes the only viable architecture.

Public Records Act 1958. Outputs of an AI system that inform a decision likely qualify as public records. They need to be retained, retrievable, and produceable to a tribunal. If your AI vendor cannot give you all of the conversation history, with timestamps, on a 12-hour notice, you have a Public Records Act problem.

The good news is that all of this is solvable with engineering. The bad news is that it cannot be ticked off later. The architecture you choose at the start determines whether you’ll pass or fail an assessment.

Three architectures that get to yes

Three patterns work cleanly inside UK public-sector compliance posture.

1. UK-hosted commercial SaaS with the right contract

Anthropic, OpenAI and Google all offer UK or EU data residency tiers in their enterprise products. Properly contracted, with a UK Data Processing Agreement, model-training opt-out, audit-log access, and SCCs in case any sub-processor sits outside the UK, this can be appropriate for OFFICIAL data and most NHS patient-identifiable workflows.

What works: drafting, summarisation, internal-document Q&A, training material generation, code generation for non-classified systems.

What doesn’t: anything OFFICIAL-SENSITIVE or above; anything where the source data must demonstrably never leave the perimeter; any workflow where the supply-chain risk is unacceptable to the SIRO.

Cost shape: subscription-based, low-friction, fast to start. Procurement and DPIA take 4–12 weeks; setup is a day.

2. UK-resident managed inference

A model hosted on UK infrastructure (commonly AWS London, Azure UK South, or a UK sovereign cloud), running the same open-weight models you’d use commercially — Llama, Qwen, Gemma, Mistral. The provider runs the GPUs; the data never leaves the UK; the organisation controls the prompts and the logs.

What works: patient-identifiable summarisation, regulated decision-support, anywhere the SaaS option is too loose.

What doesn’t: OFFICIAL-SENSITIVE workloads where on-premise is mandated; offline/air-gapped contexts.

Cost shape: higher per-query than SaaS but no procurement-and-DPIA-of-a-US-vendor friction. Build phase typically 6–12 weeks.

3. Air-gapped on-premise inference

The model runs on hardware inside the organisation’s perimeter — a workstation with a GPU, or a small cluster in a server room. Tools like Ollama or vLLM serve the inference; the data never touches a network at all. The model weights themselves are downloaded once, then stay local.

What works: OFFICIAL-SENSITIVE and above; defence-adjacent workloads; NHS Trusts that have explicit policy banning cloud AI; police use cases involving operationally sensitive data.

What doesn’t: anything that needs the absolute frontier of model capability — current open-weight models are roughly six months behind the commercial frontier on reasoning-heavy tasks.

Cost shape: higher capital cost (hardware), lower running cost, very high regulatory comfort. Build phase typically 8–16 weeks.

This third option is what most defence and NHS-Trust-level engagements actually require, despite vendors often pushing the SaaS route. It’s not the cheapest, but it’s the only architecture that survives the IG conversation cleanly for sensitive workloads.

Supplier assurance: the SC clearance question

A common stumbling block in public-sector AI projects is supplier assurance. The procurement framework — G-Cloud 14, Crown Commercial Service, NHS Shared Business Services — increasingly asks for vendor staff to hold relevant clearances, particularly for any work that touches sensitive data or production systems.

The clearance levels that matter:

  • BPSS (Baseline Personnel Security Standard). Minimum bar for working on government contracts at OFFICIAL.
  • CTC (Counter-Terrorist Check). Required for some sensitive government roles.
  • SC (Security Cleared). The standard bar for defence-adjacent and sensitive central-government work. Five-year vetting, financial and personal background.
  • DV (Developed Vetting). Above SC. Required for the most sensitive material.

Suppliers without cleared personnel typically can’t bid for sensitive workloads. The procurement team will simply mark the response down. This is one of the strongest filters in the public-sector AI market right now — and one of the reasons regulated-sector AI tends to come from smaller specialist suppliers rather than the global SaaS incumbents.

Launchpad Technology runs the GovOptimise division specifically for public-sector contracts, with SC-cleared personnel, NHS and Police operational experience, and a track record across regulated environments. That combination is rare enough in the market that it shifts what’s deliverable.

What good adoption looks like

A well-run public-sector AI adoption typically follows this shape:

  1. Use case selection. Pick a workflow with quantifiable outcome (time saved per case, error rate reduction, throughput uplift). Avoid “we should have AI” projects with no metric.
  2. DPIA first. Information governance assessment before any procurement decision. The DPIA tells you which architecture is viable.
  3. Architecture chosen by the IG and risk profile. Don’t pick the model first; pick the deployment architecture first, then choose the model that fits.
  4. Pilot with a clear stop criterion. “We’ll run for 12 weeks; if accuracy on the audit sample is below X% or staff acceptance is below Y%, we stop.” Pre-commit to the stop criterion.
  5. Audit logging from day one. Every prompt, every output, every override, every escalation. Retained per the records-management policy.
  6. Independent assurance. Someone who isn’t the vendor reviews the system periodically — annually at minimum — for drift, bias, and continued fitness for purpose.
  7. Decommissioning plan. What happens to the data, the logs, and the customer relationship if the system is retired? Written on day one.

This is not the most exciting playbook. It’s much less compelling than “just plug in ChatGPT and watch the productivity gains”. But it’s the playbook that actually gets to a system that’s live, used, and not creating an information-governance incident in eighteen months’ time.

Where to start if you’re inside an NHS Trust, force, or department

Three pragmatic openings:

The discharge-summary or report-drafting workflow. Universal across NHS and Police; high time-cost; relatively contained risk profile if architected right. Proven uplift — RCP-published trials suggest 30–50% time saving on drafts where a human edits and signs off.

The internal-document Q&A workflow. “Where in our policy library does it say…” Saves a great deal of time for staff who currently search PDFs. Architecture choice depends on data classification.

The intake-triage workflow. Inbound enquiries (NHS 111, Police call-handling, council customer service) classified by AI to route to the right team. Modest accuracy improvements over rule-based triage; substantial time saving.

In every case, the question is the same: which architecture lets us do this lawfully, auditably, and within our IG framework? Get that right, and the rest is engineering.

If you’d like a hand thinking through that question for a specific workflow inside an NHS Trust, force, or department, that’s exactly what GovOptimise — Launchpad Technology’s public-sector division — exists to do.