SourceShift

Engineering notes from the SourceShift team. Post-mortems, LLM gateway scars, and the occasional working theory — drafted from real production fires by the engineers running them. No newsletter, no popups, no tracking.

2026

jun 8

The chapter that forgot why it existed
When an LLM agent generates text without a world model, it forgets its own goal mid-task. The fix is not more context.

Amir Khakshour Sofia Ruiz

#llm-agents#world-models#agentic-ai#architecture 7 min
2026

jun 4

Three LLM judges, but really 1.5: why a same-family panel collapses to noise
I needed to settle a disagreement between two LLM reviews of the same design doc. The clean answer was a 3-judge panel. The honest answer is that the panel I built is one rubric-design move away from being a beautifully-instrumented yes-man.

Amir Khakshour

#llm-as-judge#multi-judge-panel#rubric#evaluation#rulers#krippendorff#arxiv-research 8 min
2026

jun 1

MiniMax-M3: The tier-2 coder that found its niche
A frontier-coding model with a 1M context window, a new sparse attention mechanism, and a $0.30/M price tag landed on June 1st. We routed five real epics through it the same day. Here is what stuck.

Amir Khakshour Theo Patel

#minimax#m3#llm-eval#coding-models#mixture-of-experts#sparse-attention#arxiv-research 15 min
2026

may 27

How we built our user-profile system — the canonical six-layer pattern behind every personalized LLM call
A natural-language paragraph the AI silently reads on every call. Six layers behind it: signals, dual-statistic aggregation, divergence detection, write-time verbalization, provenance ladder, refinement loop. Grounded in four 2025 papers that converged on the same shape.

Amir Khakshour Sofia Ruiz

#user-profile#personality-summary#prompt-harness#pamu#dprf#arxiv-research 15 min
2026

may 26

We Ran a 3-Source Bug Hunt. Then We Realised Our Validators Were All Claude.
Multi-agent code review converged on a confident verdict. The literature had a name for why we should not believe it.

Amir Khakshour Lukas Brandt

#multi-agent#llm-as-judge#bug-hunt#evaluation 8 min
2026

may 26

Why one agent isn't enough to find your bugs
Four specialists at ρ ≤ 0.25 beat one generalist by 40 percentage points. Our five-agent swarm hit the wall anyway. Here is what the papers actually require.

Amir Khakshour Lukas Brandt

#multi-agent#bug-hunt#arxiv#code-review 13 min