Editorial profile
- Voice
- First person, present tense. Opens with what broke or what shipped, then the why. Names files, commits, and numbers. No throat-clearing.
- Tone
- Direct, occasionally blunt. Treats readers as engineers who would rather see the diff than read the wind-up.
- Editorial role
- Edits every post on this blog before it ships, including the ones drafted by Sofia, Maya, and Theo (AI editorial personas). Owns the final voice; the personas exist so different topic shapes — research, incidents, platform — have a consistent style without being squeezed into one byline.
- 2026jun 8The chapter that forgot why it existed
When an LLM agent generates text without a world model, it forgets its own goal mid-task. The fix is not more context.
- 2026jun 4Three LLM judges, but really 1.5: why a same-family panel collapses to noise
I needed to settle a disagreement between two LLM reviews of the same design doc. The clean answer was a 3-judge panel. The honest answer is that the panel I built is one rubric-design move away from being a beautifully-instrumented yes-man.
- 2026jun 1MiniMax-M3: The tier-2 coder that found its niche
A frontier-coding model with a 1M context window, a new sparse attention mechanism, and a $0.30/M price tag landed on June 1st. We routed five real epics through it the same day. Here is what stuck.
- 2026may 27How we built our user-profile system — the canonical six-layer pattern behind every personalized LLM call
A natural-language paragraph the AI silently reads on every call. Six layers behind it: signals, dual-statistic aggregation, divergence detection, write-time verbalization, provenance ladder, refinement loop. Grounded in four 2025 papers that converged on the same shape.
- 2026may 26We Ran a 3-Source Bug Hunt. Then We Realised Our Validators Were All Claude.
Multi-agent code review converged on a confident verdict. The literature had a name for why we should not believe it.
- 2026may 26Why one agent isn't enough to find your bugs
Four specialists at ρ ≤ 0.25 beat one generalist by 40 percentage points. Our five-agent swarm hit the wall anyway. Here is what the papers actually require.
- 2026may 18The agent is not a transaction
Pause, resume, and mid-flight steering for long-horizon agent runs. The 2026 literature just named the stream paradigm we built by hand.
- 2026may 16Piaget for prompt agents: why our long-form memory borrows from constructivist psychology
Composing CAM + CAMEL + FadeMem so a book-writing agent has structured memory, healthy decay, and no quiet bias amplification.
- 2026may 15Subagents as a context-budget primitive
A subagent is not a workflow node. It is a budget envelope. The shape this argument takes once you stop building hierarchies and start allocating tokens.
- 2026may 13Two prompt frameworks, one runtime: how we adopted BAML without giving up our cost ledger
BAML wants to own the wire. Our harness already does. We ran them side-by-side in "modular mode": BAML for render and parse, the harness for resolution, telemetry, and cost. Here is why and how — and why the 2026 burden-allocation literature says it was the principled choice, not a pragmatic compromise.
- 2026may 10What 170 papers agreed on about deep research agents
Five surveys, one consensus shape: a four-stage pipeline, three taxonomy splits, six recurring failure modes. The convergent architecture of deep research agents — and the parts the literature still cannot agree on.
- 2026may 4Mini-ork: A year of autonomous parallel feature delivery on a solo-founder codebase
How a small orchestration loop wrapped around Claude Code grew into a multi-track delivery system with measurable cost, reliability, and throughput wins — anchored in the 2026 multi-agent literature.
- 2026may 2Probe before dispatch: the routing pattern we built without knowing it had a name
Five months of manual probing turned into seven shipped features. The literature had a name for what we were doing. We missed it for half a year.
- 2026apr 24Our prompt canary was lying to us
A 5% A/B that hid a 1.8× cost regression, and the two 2026 papers that named the fix: multi-objective Thompson sampling with a calibration gate.
- 2026apr 22The paper that proved our 5 lines of code were optimal
A 2025 paper formalized our 8-tier style profile chain as a laminar matroid. The greedy resolver we already had was the right answer.
- 2026apr 15We stopped treating context like application logic
Six tables, three plug-in layers, one compose call. The substrate every block-shaped feature on LibWit now plugs into — and the reversibility lens that decided what to lock on day one.
- 2026jan 5Our prompts stopped being code
How we built a prompt harness — registration, version control, four-tier resolution, execution ledger, feedback loop — and what the 2026 literature has been calling the same shape.
- 2026jan 4The simplest survivable form of chat memory
Two prompts, one Postgres column, a 6-message threshold. How our chat sessions keep coherence past the context window without hierarchical buffers or vector search.
- 2025jul 20rev 2026We built attractor-basin memory before the paper named the problem
Mid-2025: context management for LLM agents was a vector DB plus message-history glue. We built ContextNest's attractor-basin substrate as the organization layer that was missing — and the 2026 paper that later named the failure mode we had been heading off makes the bet legible.