Six AI tools, headless, right now
On my M5 Max, 128GB, right now: six AI tools running headless. Not one of them has a chat window open.
Gemma 4 31B is summarising transcripts. Qwen 3 4B is doing voice checks. Nomic is indexing 200 LinkedIn posts. mlx-whisper is transcribing audio. Codex is auditing a codebase. Bughunt is running adversarial QA against a deployed URL. All API-only. No UI, no chat. Claude Code is driving the lot — some of it through a harness I built called COR CLI.
That’s what most “AI-first” talk misses. Everyone treats AI as a chat window. Wrong framing. The killer workflow is a headless fleet running in parallel, with one orchestrator making the judgement calls.
Three tiers, and the actual economics
Local (£0). Gemma, Qwen, Nomic and mlx-whisper, all on my own hardware. No cloud bill. This tier handles what used to burn Claude tokens: whole codebases, 90-minute transcripts, twenty competitor sites, documentation ingestion.
Subscription-covered (£0 marginal). Codex for code audit, Bughunt for adversarial QA — running on the ChatGPT subscription I already pay for. This afternoon that was 11 real advisories in 1,000 lines of Go, in 11 minutes, for £0.
Claude Code conducts. It picks the tool, chains the outputs, and handles the judgement, the client work and the final polish.
The layer I didn’t expect to need: a harness
LM Studio gives Gemma a brain but not hands. She can generate text; she can’t read files, run grep, or call git. So I built COR CLI — a Rust CLI that wraps any model (LM Studio, Ollama, or Claude itself) in a full agent loop with 28 tools: file I/O, bash, git, search, LSP and an MCP client.
Here is the win. COR CLI inherits everything Claude Code already has — CLAUDE.md files, skills, hooks, MCP servers — and they all load regardless of which model you point it at. Same infrastructure. Swappable brain. You keep the framework, and you stop being locked to one provider.
Why this is AI First Principles, not a hack
This is the whole point of the AI First Principles framework: the guarantees live in the structure — the files, skills and registry the AI reads — not in any one model’s head. Get the structure right and the model becomes an interchangeable runtime. That is exactly what makes a swappable brain possible: the same CLAUDE.md, the same skills and hooks drive a local Gemma, a hosted Claude, or whatever comes next — with no rebuild.
My real stack
- Gemma through COR CLI for autonomous research agents — it reads, greps and writes reports.
- Qwen through raw LM Studio for instant classification, where no tools are needed.
- Codex for code audit.
- Claude Code as the orchestrator on top, reading everyone else’s output.
What changes when bulk compute is free and the infrastructure is portable
- You stop rationing token budget.
- You stop being locked to one vendor.
- You script experiments instead of avoiding them.
- Overnight audits across thousands of files become normal.
Still iterating — still one version at a time. But “AI-first” just got a lot cheaper, a lot more interesting, and a lot more portable.


