Amir Khakshour

Founder, LibWit

Builds LibWit, a research workspace for long-form technical work. Writes mostly about LLM gateways, prompt harnesses, and the operational debt that piles up around agentic AI when nobody is looking.

19 posts

Editorial profile

Voice: First person, present tense. Opens with what broke or what shipped, then the why. Names files, commits, and numbers. No throat-clearing.
Tone: Direct, occasionally blunt. Treats readers as engineers who would rather see the diff than read the wind-up.
Editorial role: Edits every post on this blog before it ships, including the ones drafted by Sofia, Maya, and Theo (AI editorial personas). Owns the final voice; the personas exist so different topic shapes — research, incidents, platform — have a consistent style without being squeezed into one byline.

2026

jun 8

The chapter that forgot why it existed
When an LLM agent generates text without a world model, it forgets its own goal mid-task. The fix is not more context.

Amir Khakshour Sofia Ruiz

#llm-agents#world-models#agentic-ai#architecture 7 min
2026

jun 4

Three LLM judges, but really 1.5: why a same-family panel collapses to noise
I needed to settle a disagreement between two LLM reviews of the same design doc. The clean answer was a 3-judge panel. The honest answer is that the panel I built is one rubric-design move away from being a beautifully-instrumented yes-man.

Amir Khakshour

#llm-as-judge#multi-judge-panel#rubric#evaluation#rulers#krippendorff#arxiv-research 8 min
2026

jun 1

MiniMax-M3: The tier-2 coder that found its niche
A frontier-coding model with a 1M context window, a new sparse attention mechanism, and a $0.30/M price tag landed on June 1st. We routed five real epics through it the same day. Here is what stuck.

Amir Khakshour Theo Patel

#minimax#m3#llm-eval#coding-models#mixture-of-experts#sparse-attention#arxiv-research 15 min
2026

may 27

How we built our user-profile system — the canonical six-layer pattern behind every personalized LLM call
A natural-language paragraph the AI silently reads on every call. Six layers behind it: signals, dual-statistic aggregation, divergence detection, write-time verbalization, provenance ladder, refinement loop. Grounded in four 2025 papers that converged on the same shape.

Amir Khakshour Sofia Ruiz

#user-profile#personality-summary#prompt-harness#pamu#dprf#arxiv-research 15 min
2026

may 26

We Ran a 3-Source Bug Hunt. Then We Realised Our Validators Were All Claude.
Multi-agent code review converged on a confident verdict. The literature had a name for why we should not believe it.

Amir Khakshour Lukas Brandt

#multi-agent#llm-as-judge#bug-hunt#evaluation 8 min
2026

may 26

Why one agent isn't enough to find your bugs
Four specialists at ρ ≤ 0.25 beat one generalist by 40 percentage points. Our five-agent swarm hit the wall anyway. Here is what the papers actually require.

Amir Khakshour Lukas Brandt

#multi-agent#bug-hunt#arxiv#code-review 13 min
2026

may 18

The agent is not a transaction
Pause, resume, and mid-flight steering for long-horizon agent runs. The 2026 literature just named the stream paradigm we built by hand.

Amir Khakshour Sofia Ruiz

#llm-ops#agents#long-horizon#pause-resume#steering#arxiv-research 15 min
2026

may 16

Piaget for prompt agents: why our long-form memory borrows from constructivist psychology
Composing CAM + CAMEL + FadeMem so a book-writing agent has structured memory, healthy decay, and no quiet bias amplification.

Sofia Ruiz Amir Khakshour

#llm-ops#agent-memory#constructivist#long-form-generation#arxiv-research 9 min
2026

may 15

Subagents as a context-budget primitive
A subagent is not a workflow node. It is a budget envelope. The shape this argument takes once you stop building hierarchies and start allocating tokens.

Theo Patel Amir Khakshour

#llm-ops#agents#context-engineering#subagent#arxiv-research 12 min
2026

may 13

Two prompt frameworks, one runtime: how we adopted BAML without giving up our cost ledger
BAML wants to own the wire. Our harness already does. We ran them side-by-side in "modular mode": BAML for render and parse, the harness for resolution, telemetry, and cost. Here is why and how — and why the 2026 burden-allocation literature says it was the principled choice, not a pragmatic compromise.

Amir Khakshour

#llm-ops#baml#prompt-engineering#structured-output#arxiv-research 12 min
2026

may 10

What 170 papers agreed on about deep research agents
Five surveys, one consensus shape: a four-stage pipeline, three taxonomy splits, six recurring failure modes. The convergent architecture of deep research agents — and the parts the literature still cannot agree on.

Sofia Ruiz Theo Patel Amir Khakshour

#llm-ops#deep-research#agents#survey#arxiv-research 14 min
2026

may 4

Mini-ork: A year of autonomous parallel feature delivery on a solo-founder codebase
How a small orchestration loop wrapped around Claude Code grew into a multi-track delivery system with measurable cost, reliability, and throughput wins — anchored in the 2026 multi-agent literature.

Theo Patel Amir Khakshour

#agentflow#mini-ork#orchestration#solo-founder#arxiv-research 11 min
2026

may 2

Probe before dispatch: the routing pattern we built without knowing it had a name
Five months of manual probing turned into seven shipped features. The literature had a name for what we were doing. We missed it for half a year.

Amir Khakshour Maya Lindqvist

#llm-ops#agent-routing#capability-profile#confidence-cascade#arxiv-research 14 min
2026

apr 24

Our prompt canary was lying to us
A 5% A/B that hid a 1.8× cost regression, and the two 2026 papers that named the fix: multi-objective Thompson sampling with a calibration gate.

Maya Lindqvist Amir Khakshour

#llm-ops#bandits#thompson-sampling#prompt-routing#arxiv-research 5 min
2026

apr 22

The paper that proved our 5 lines of code were optimal
A 2025 paper formalized our 8-tier style profile chain as a laminar matroid. The greedy resolver we already had was the right answer.

Amir Khakshour

#llm-ops#submodular#personalization#arxiv-research 9 min
2026

apr 15

We stopped treating context like application logic
Six tables, three plug-in layers, one compose call. The substrate every block-shaped feature on LibWit now plugs into — and the reversibility lens that decided what to lock on day one.

Amir Khakshour Sofia Ruiz

#llm-ops#context-engineering#architecture#memory-systems#arxiv-research 11 min
2026

jan 5

Our prompts stopped being code
How we built a prompt harness — registration, version control, four-tier resolution, execution ledger, feedback loop — and what the 2026 literature has been calling the same shape.

Amir Khakshour

#llm-ops#prompt-engineering#harness#prompt-versioning#arxiv-research 14 min
2026

jan 4

The simplest survivable form of chat memory
Two prompts, one Postgres column, a 6-message threshold. How our chat sessions keep coherence past the context window without hierarchical buffers or vector search.

Sofia Ruiz Amir Khakshour

#llm-ops#chat-memory#context-compression#rolling-summary#arxiv-research 18 min
2025

jul 20

rev 2026

We built attractor-basin memory before the paper named the problem
Mid-2025: context management for LLM agents was a vector DB plus message-history glue. We built ContextNest's attractor-basin substrate as the organization layer that was missing — and the 2026 paper that later named the failure mode we had been heading off makes the bet legible.

Amir Khakshour Sofia Ruiz

#llm-ops#memory-systems#continual-learning#attractor-dynamics#arxiv-research#rust 14 min