Whitepaper · COR Intelligence

AI First Principles: Structure as Substrate for Drift-Free, Model-Agnostic Software Construction

Pete Gypps, COR Intelligence · First published June 2026 · methodology in practice since 2020

Abstract

Most attempts to build real software with large language models centre on the model: a better prompt, a larger context window, more agents. This paper argues that the model is rarely the bottleneck and never a safe foundation. A language model is the scarcest and least reliable runtime in the system — its context is finite and it drifts — so anything that must be guaranteed cannot live in an instruction the model is asked to remember; it must live in the shape of the system itself. I treat an AI-built engagement as a single file-and-folder program with several runtimes, assign every task to the cheapest runtime that performs it correctly, separate the concerns that must never bleed, and put each guarantee into structure rather than into a remembered sentence. The governing operating rule is AI first, human second: the model carries the work and runs autonomously for as long as it usefully can, deferring to a person only when judgement genuinely changes the outcome. The central consequence is that because the guarantees live in the structure and not in the model, the model becomes a pluggable, interchangeable runtime — any sufficiently capable system executes the same structure. That interchangeability is what converts AI from a tool one operates into an autonomous, vendor-independent workforce. This is practitioner doctrine, in production since 2020, presented as method rather than as academic position.

1. Introduction

The dominant instinct when an AI workflow underperforms is to add brain: a stronger model, a bigger window, retrieval, more agents. That instinct is almost always wrong. The model is rarely the limiting factor; the limiting factor is the structure of what was fed to it. Two properties of a language model make this true. First, it is the scarcest runtime available — its context is a finite budget that is quickly spent. Second, and more importantly, it is the runtime that drifts: across a long task it loses track of which step it is on, which constraints apply, and what it was told at the outset, and it silently reverts to generic behaviour.

These two properties together produce a single design law. If something must be guaranteed — a boundary respected, a secret never committed, a name applied consistently, a step performed in order — it cannot be entrusted to the model's memory of an instruction, because that memory is both demanding of finite context and prone to decay. It must instead be expressed in the shape of the system, where it holds without anyone having to remember it.

This paper sets out that method. It is, in essence, ordinary software engineering applied to an unusual substrate: a file-and-folder program in which one of the runtimes happens to be a language model. The thesis, stated once: assign every task to the cheapest runtime that does it correctly, separate the concerns that must never bleed, and put every guarantee that matters into the structure's shape rather than a sentence the model is asked to remember. Everything that follows explains why each clause is load-bearing — and why the method's most valuable consequence is that it makes the model itself replaceable.

2. Background: the engineering lineage

None of the primitives here are new; their composition for AI-built work is. The method descends directly from the canon of software engineering:

Information hiding (Parnas, 1972): modules expose an interface and conceal their internals, so a change on one side does not ripple to the other. Here, identity, process and product are hidden from one another by design.
Separation of concerns (Dijkstra): distinct problems are kept distinct so each can be reasoned about, and tooled, independently.
Pipe-and-filter and multi-pass compilation: a complex transformation is decomposed into ordered stages, each with a clean contract, the output of one becoming the input of the next.
Build systems and dependency graphs (Feldman's Make, 1979): the declared inputs of a step are its dependency graph, which means staleness — what must be recomputed — is mechanical, not a matter of judgement.
Literate programming (Knuth, 1984): the artefact is written to be read and reasoned about by a human as much as executed by a machine.
The Unix philosophy: small components that do one thing well, composed through a uniform interface — the filesystem and its names.
Conway's law (1968): the structure of what you build mirrors the structure of how you organise to build it. If the organising structure is sound, the product inherits that soundness.
The Theory of Constraints (Goldratt, 1984): a system's throughput is governed by its binding constraint. Finding the real constraint — rather than adding effort everywhere — is the whole game.

The contribution of this work is not to invent these ideas but to recognise that an AI build is a program over a filesystem, and that the language model is simply the scarcest and least reliable runtime within it — and then to engineer accordingly.

3. The central claim: influence versus enforcement

A written instruction to a language model is influence, not enforcement. It is all just text, processed identically to everything else in context; reading it is the execution. A rule written in prose can therefore be overridden by later context exactly as any instruction can. This is the deepest idea in the method, and it has a sharp corollary:

Anything that must be guaranteed cannot live in a sentence the model is asked to remember. It must live in the shape.

A name that a tool derives by default, a secret placed where the version-control system physically cannot reach it, the folder a process is standing in — these are guarantees, because they hold without anyone choosing to honour them at the right moment. A sentence that says "remember to do this correctly" is a hope. It follows that a layout which permits a careless act to cause harm is a design defect, not a discipline problem. If the structure lets a secret be committed or a step be skipped, the structure is wrong — not the operator, and certainly not the model. Good structure makes the correct action the default and the dangerous action difficult or impossible, and uses documentation only as backup.

4. The operating principle: AI first, human second

On that foundation sits the operating principle the method is named for: AI first, human second. The model carries the work and defers to a person only when it genuinely must. This inverts the prevailing pattern, in which a human approves every step and the model is treated as an assistant that must ask permission before doing anything consequential. Under AI first, the model leads — it plans, decides, builds, tests and ships — and a human enters only where human judgement actually changes the outcome.

The principle rests on three rules:

The AI leads; humans do not gate. Give the model the context, the standards and the authority to make real decisions, then let it act, rather than approving each move.
Let it run longer before it asks. Most value is lost at the moment of interruption. The further a model can run autonomously before it needs a person, the more it produces; the discipline is to design the work so it can run long, and to escalate only when necessary.
Human second, not human-never. People remain essential — for strategy, taste, and the calls a machine should not make alone. The aim is not to remove the human but to place them correctly: second, not in the middle of everything.

Crucially, this principle is only safe to apply because of the structural foundation in §3. Letting a drift-prone runtime lead, unsupervised, for long stretches would be reckless — unless the guarantees that keep it correct live outside the model, in the structure. AI first, human second is not a slogan about trust; it is what becomes possible once correctness no longer depends on the model remembering anything.

5. Deterministic-first: the cheapest correct runtime

The substrate is a polyglot program with several runtimes, and file type is a runtime decision, not a formatting one:

Natural-language instructions run on the model — the scarce interpreter, finite of context, and the one that drifts. Reserve them for genuine judgement only.
Configuration and data are read by scripts and never loaded into the model's context. Use them for facts, settings and identity.
Code runs deterministically, returns a result, and consumes no model context. Use it for anything that has one correct output.

The governing law is therefore deterministic-first: put logic in code, data in configuration, and only genuine interpretation in front of the model. Never make the scarce, drift-prone interpreter do what a cheap function does correctly. The cost that matters here is not a monetary bill — under flat-rate tooling there may be none — but the model's finite context and its tendency to drift; spend both only where genuine judgement requires it. As a rough guide, a healthy system lands near 60% deterministic code, 30% structured data, and 10% model judgement; systems that ignore that ratio are reliably the troubled ones. And because the model is also the drift-prone runtime, deterministic-first is anti-drift-first: every task moved off the model is a task that can no longer drift. The diagnostic question to ask of any task is simply: what is the cheapest runtime that does this correctly?

Figure 1. File type is a runtime decision. Put each task on the cheapest runtime that does it correctly — most work on code and configuration, only genuine judgement on the model.

6. Find the axis before adding brain

When a workflow struggles, the discipline is not to add brain but to find the axis the current consumer cannot handle as a single unit, split along it, and stop at that axis's stopping rule:

execution tangled → decompose by complexity, and stop once each part is independently legible;
retrieval is genuinely about similarity → use vectors, but do not build an index for a handful of records a simple lookup would serve;
precision matters → tighten only to the threshold the task requires, no further;
context arrives at the wrong time → route it, and disclose it progressively;
judgement is entangled with measurement → script the measurable part and leave the model the genuine call.

Decomposition is a sweet spot, not a gradient. Every axis has a stopping rule, and over-splitting — forking a trivial function, building an index for a list — is the same error as failing to split at all. The skill is diagnostic: name the axis the consumer cannot absorb, cut once along it, and stop.

7. The name is the interface; guarantees by topology

In a filesystem, the name is the interface that every actor acts upon — a human reading a directory listing, a script, a deployment tool, the model itself. A generic name is a wrong interface that produces wrong actions silently. The remedy is to make the default correct by making the name correct: name a thing what you would want every tool to derive from it, so that correctness does not depend on anyone reading and obeying a sentence at the exact moment a command runs.

The same reasoning generalises from names to topology. The strongest guarantees are the ones the layout makes physically true. A secret placed outside the boundary of a version-controlled tree cannot be committed, regardless of vigilance, because it is not within that tree's universe. A step encoded as the folder a process occupies cannot be confused with another step, because position is unambiguous. These are guarantees by construction. Wherever a rule can be converted from a remembered instruction into a fact of the topology, it should be — because the topology holds when memory does not.

8. Separation of concerns: identity, process, artefact

A real engagement contains three separable concerns, each a different engineering problem with a different right tool, and the discipline is refusing to conflate them:

Identity — who the work is for: the durable facts, standards, voice and settings. This is configured once, in a single authoritative source, and pointed at — never re-typed into each build, because re-typing creates drift and duplicated surfaces to leak. Configure the factory once; each run consumes it and produces a deliverable.
Process — how the deliverable is produced: ordered stages, each a clean contract of inputs, work and outputs, with review where human judgement belongs. The stage a process occupies is expressed by structure, so it cannot be forgotten.
Artefact — the product that ships: ordinary architecture, clean, holding only what is deployed.

Three rules govern every shared resource across these layers. Read freely, write deliberately: shared state is read-only to its consumers, and mutation is a controlled act, because a resource anyone can append to ambiently degrades into a dump. The resource governs: when local context disagrees with an authoritative source, the source wins and the mismatch is flagged, never silently overwritten. The name carries the meaning, as in §7. Every failure of this kind that occurs in practice traces to a conflation — identity re-typed into a build and going stale, a secret bleeding from process into product, a process losing track of its own stage. Keeping the three apart is most of the method's reliability.

Figure 2. Three concerns kept apart: who the work is for, how it is produced, and what ships. Every failure in practice traces to conflating them.

9. Position-addressed context and anti-drift

Because the model drifts, its sense of where it is must be supplied by structure rather than recalled from memory. The method does this by making context position-addressed: the folder a process is working in is its current step and scope, re-asserted every time it reads its surroundings, at no cost and with no action required. This survives the model's context being compacted or summarised, because it is re-derived from position rather than carried in memory.

Two structural choices make this work. The hierarchy is kept flat and read from the top down: identity is asserted once at the top and inherited downward, so a process reads up the chain to learn the rules that apply, rather than each level re-declaring its own and competing. And nothing that asserts identity is allowed to nest, because a nested assertion cascades and pollutes the boundary of whatever sits beneath it. The result is that scope and identity are re-established for free on every pass, and the model cannot lose its place even across a long autonomous run.

10. Structure as substrate: model-agnosticism and autonomy

The method's most consequential property follows directly from §3. If the guarantees live in the structure and not in the model, then the model is a runtime, not the program. The program is the structure: the names, the topology, the stages, the single source of truth, the deterministic code that carries most of the work. The model is plugged into that structure to perform the share of the work that is genuine judgement.

A runtime is, by definition, interchangeable. Any sufficiently capable model can be dropped into the same structure and execute it, because the structure — not the model — is where correctness is enforced. This has three powerful consequences:

Vendor independence. The system is not bound to a particular model or provider. When a better, cheaper, or more capable model appears, it replaces the runtime without the structure changing. The investment compounds across model generations instead of resetting with each one.
Resilience. A model deprecated, rate-limited, or regressed does not break the system; another capable model takes its place. The guarantees are unaffected because they never depended on the specific model.
Genuine autonomy. Autonomy is only safe when correctness is independent of the operator's continuous attention and of any single model's reliability. Because the structure enforces the guarantees, a capable model can be given the lead and allowed to run long — and can be swapped — without the operator being in the loop and without lock-in. The structure is what makes "AI first, human second" survivable at length and across change.

This is the difference between automating with AI and building on AI. A prompt-centric system is a bet on one model's behaviour; it is fragile to model change and locked to its vendor. A structure-centric system treats models as fungible labour: the structure is the durable asset, and the AI is interchangeable capacity that plugs into it. The same structure that prevents drift is therefore also what delivers portability and autonomy — they are the same property viewed from two angles.

Figure 3. The structure is the program; the AI is a runtime plugged into it. Any capable model executes the same structure — so the model is interchangeable. That is what delivers autonomy and freedom from any one vendor.

11. The discipline of stopping

Structure is added to prevent drift and to control context — never for its own sake, and only as the crystallised residue of proven manual work, never as anticipatory scaffolding. The progression is disciplined and ascending: default to doing the thing by hand; capture it in a written contract only after it has been done and verified; promote a written step to code only when the same fixed output is being produced repeatedly; introduce a reusable, on-demand capability only when a part of the work both repeats and stays out of context until needed; and automate end-to-end only after many proven repetitions and demonstrated value.

The governing rule is: use the lowest layer that meets the requirement, and climb only on a measured failure — a reproduced bottleneck, a real failure case, a test that fails — never on a feeling. The largest gains come from the climb out of ad-hoc work into disciplined, structured single-agent operation with cheap deterministic support; the further climb into coordinated multi-agent orchestration is usually premature and should be earned by evidence, not assumed. A small job earns a small structure; over-scaffolding a trivial task is itself a failure to find the axis. Honesty obliges a final note: the return-on-investment claims here are asserted from practice, not yet formally instrumented, and the rule is stated without a curve until one can be measured.

12. Discussion, limitations and open problems

The method as described is strongest at entry: it re-asserts who, where and which step on every pass, so the model cannot drift into the wrong identity or stage. Its principal open edge is at output. There is not yet a deterministic mechanism that verifies a stage's output did not drift from the source that should have produced it. The completing moves are known and are themselves deterministic — and therefore consistent with the discipline of §11:

Output provenance: lightweight markers in outputs that link each part back to the instruction or fact that produced it, giving auditability by construction.
A verification clause in each stage contract that names which earlier outputs to re-check and against what criteria, so the structure confirms consistency before a human reviews — the missing machine-checked gate.
Edit the source, not the output: when an output is wrong, trace it to the thin contract or specification that produced it and fix it there, so every future run is correct, rather than patching the output and losing the diagnostic signal.
Incremental recompilation: re-run only the stages whose declared inputs changed, the inputs being the dependency graph — Make, applied to produced work.

Two limitations deserve naming. First, conventions that a careless action can violate still need a deterministic guard to catch the violation; a convention without an audit is a soft rule. Second, the precise criterion for when coordinated orchestration is genuinely warranted — rather than over-engineering — remains the standing question to close. Stating the boundary of a method honestly is part of the method.

13. Conclusion

The argument is simple and its consequences are large. A language model is the scarcest and least reliable runtime in a software system, so anything that must be guaranteed belongs in the structure, not in the model's memory; every task belongs on the cheapest runtime that performs it correctly; and the model should lead the work — AI first, human second — deferring to a person only where judgement decides the outcome. Put together, these produce a system whose correctness does not depend on any particular model, which is exactly what makes the model interchangeable. That interchangeability is the prize: it turns AI from a tool one operates into an autonomous, vendor-independent workforce that keeps working as models change, because the durable asset was never the model — it was the structure.

Figure 4. The method as a connected whole — seven principles around one operating idea: put the guarantees in the structure, and let the AI lead.

References

Parnas, D. L. (1972). On the Criteria To Be Used in Decomposing Systems into Modules. Communications of the ACM, 15(12).
Dijkstra, E. W. (1982). On the Role of Scientific Thought. In Selected Writings on Computing.
Conway, M. E. (1968). How Do Committees Invent? Datamation, 14(4).
Feldman, S. I. (1979). Make — A Program for Maintaining Computer Programs. Software: Practice and Experience, 9(4).
Knuth, D. E. (1984). Literate Programming. The Computer Journal, 27(2).
Goldratt, E. M., & Cox, J. (1984). The Goal: A Process of Ongoing Improvement. North River Press.
McIlroy, M. D., Pinson, E. N., & Tague, B. A. (1978). UNIX Time-Sharing System: Foreword. The Bell System Technical Journal, 57(6).

Pete Gypps — COR Intelligence. Operating doctrine, in production since 2020; presented as method, not academic position.

Back to the methodology Work with Pete →