Same Infrastructure, Swappable Brain: The Headless Fleet Pattern
Back to Blog
AI Architecture

Same Infrastructure, Swappable Brain: The Headless Fleet Pattern

Pete Gypps
Pete Gypps
Published: 2nd July 2026
5 min read

Six AI tools, headless, right now

On my M5 Max, 128GB, right now: six AI tools running headless. Not one of them has a chat window open.

Gemma 4 31B is summarising transcripts. Qwen 3 4B is doing voice checks. Nomic is indexing 200 LinkedIn posts. mlx-whisper is transcribing audio. Codex is auditing a codebase. Bughunt is running adversarial QA against a deployed URL. All API-only. No UI, no chat. Claude Code is driving the lot — some of it through a harness I built called COR CLI.

That’s what most “AI-first” talk misses. Everyone treats AI as a chat window. Wrong framing. The killer workflow is a headless fleet running in parallel, with one orchestrator making the judgement calls.

Three tiers, and the actual economics

Local (£0). Gemma, Qwen, Nomic and mlx-whisper, all on my own hardware. No cloud bill. This tier handles what used to burn Claude tokens: whole codebases, 90-minute transcripts, twenty competitor sites, documentation ingestion.

Subscription-covered (£0 marginal). Codex for code audit, Bughunt for adversarial QA — running on the ChatGPT subscription I already pay for. This afternoon that was 11 real advisories in 1,000 lines of Go, in 11 minutes, for £0.

Claude Code conducts. It picks the tool, chains the outputs, and handles the judgement, the client work and the final polish.

The layer I didn’t expect to need: a harness

LM Studio gives Gemma a brain but not hands. She can generate text; she can’t read files, run grep, or call git. So I built COR CLI — a Rust CLI that wraps any model (LM Studio, Ollama, or Claude itself) in a full agent loop with 28 tools: file I/O, bash, git, search, LSP and an MCP client.

Here is the win. COR CLI inherits everything Claude Code already has — CLAUDE.md files, skills, hooks, MCP servers — and they all load regardless of which model you point it at. Same infrastructure. Swappable brain. You keep the framework, and you stop being locked to one provider.

Why this is AI First Principles, not a hack

This is the whole point of the AI First Principles framework: the guarantees live in the structure — the files, skills and registry the AI reads — not in any one model’s head. Get the structure right and the model becomes an interchangeable runtime. That is exactly what makes a swappable brain possible: the same CLAUDE.md, the same skills and hooks drive a local Gemma, a hosted Claude, or whatever comes next — with no rebuild.

My real stack

  • Gemma through COR CLI for autonomous research agents — it reads, greps and writes reports.
  • Qwen through raw LM Studio for instant classification, where no tools are needed.
  • Codex for code audit.
  • Claude Code as the orchestrator on top, reading everyone else’s output.

What changes when bulk compute is free and the infrastructure is portable

  • You stop rationing token budget.
  • You stop being locked to one vendor.
  • You script experiments instead of avoiding them.
  • Overnight audits across thousands of files become normal.

Still iterating — still one version at a time. But “AI-first” just got a lot cheaper, a lot more interesting, and a lot more portable.

Pete Gypps

Written by

Pete Gypps

Founder & AI-Native Builder

About This Article

Six AI tools running headless on one laptop — local models doing the bulk work at £0, Claude Code making the judgement calls. Same infrastructure, swappable brain: AI-first as it actually works.

Let's Connect

Have questions about this article or need help with your IT strategy?

Book a Consultation
P
Pete Bot
Business Solutions Assistant
P

Let's Get Started!

Enter your details to begin chatting with Pete Bot

💬 Got questions? Let's chat!
P
Pete Bot
Hi! 👋 Ready to boost your business online? I'm here to help with web design, SEO, and AI solutions!