SourceShift

SourceShiftEngineering notes from the SourceShift team — bug post-mortems, LLM gateway scars, and the occasional working theory. Drafted from real production fires at blog.sourceshift.io.https://blog.sourceshift.io/The chapter that forgot why it existedhttps://blog.sourceshift.io/p/the-chapter-that-forgot-why-it-existed/https://blog.sourceshift.io/p/the-chapter-that-forgot-why-it-existed/When an LLM agent generates text without a world model, it forgets its own goal mid-task. The fix is not more context.Mon, 08 Jun 2026 00:00:00 GMTThree LLM judges, but really 1.5: why a same-family panel collapses to noisehttps://blog.sourceshift.io/p/three-llm-judges-but-really-1-5-why-a-same-family-panel-collapses-to-noise/https://blog.sourceshift.io/p/three-llm-judges-but-really-1-5-why-a-same-family-panel-collapses-to-noise/I needed to settle a disagreement between two LLM reviews of the same design doc. The clean answer was a 3-judge panel. The honest answer is that the panel I built is one rubric-design move away from being a beautifully-instrumented yes-man.Thu, 04 Jun 2026 20:00:00 GMTMiniMax-M3: The tier-2 coder that found its nichehttps://blog.sourceshift.io/p/minimax-m3-the-tier-2-coder-that-found-its-niche/https://blog.sourceshift.io/p/minimax-m3-the-tier-2-coder-that-found-its-niche/A frontier-coding model with a 1M context window, a new sparse attention mechanism, and a $0.30/M price tag landed on June 1st. We routed five real epics through it the same day. Here is what stuck.Mon, 01 Jun 2026 20:00:00 GMTHow we built our user-profile system — the canonical six-layer pattern behind every personalized LLM callhttps://blog.sourceshift.io/p/how-we-built-our-user-profile-system-the-canonical-six-layer-pattern-behind-every-personalized-llm-call/https://blog.sourceshift.io/p/how-we-built-our-user-profile-system-the-canonical-six-layer-pattern-behind-every-personalized-llm-call/A natural-language paragraph the AI silently reads on every call. Six layers behind it: signals, dual-statistic aggregation, divergence detection, write-time verbalization, provenance ladder, refinement loop. Grounded in four 2025 papers that converged on the same shape.Wed, 27 May 2026 00:00:00 GMTWe Ran a 3-Source Bug Hunt. Then We Realised Our Validators Were All Claude.https://blog.sourceshift.io/p/we-ran-a-3-source-bug-hunt-then-we-realised-our-validators-were-all-claude/https://blog.sourceshift.io/p/we-ran-a-3-source-bug-hunt-then-we-realised-our-validators-were-all-claude/Multi-agent code review converged on a confident verdict. The literature had a name for why we should not believe it.Tue, 26 May 2026 00:00:00 GMTWhy one agent isn't enough to find your bugshttps://blog.sourceshift.io/p/why-one-agent-isnt-enough-to-find-your-bugs/https://blog.sourceshift.io/p/why-one-agent-isnt-enough-to-find-your-bugs/Four specialists at ρ ≤ 0.25 beat one generalist by 40 percentage points. Our five-agent swarm hit the wall anyway. Here is what the papers actually require.Tue, 26 May 2026 00:00:00 GMTThe agent is not a transactionhttps://blog.sourceshift.io/p/the-agent-is-not-a-transaction/https://blog.sourceshift.io/p/the-agent-is-not-a-transaction/Pause, resume, and mid-flight steering for long-horizon agent runs. The 2026 literature just named the stream paradigm we built by hand.Mon, 18 May 2026 19:30:00 GMTPiaget for prompt agents: why our long-form memory borrows from constructivist psychologyhttps://blog.sourceshift.io/p/piaget-for-prompt-agents-why-our-long-form-memory-borrows-from-constructivist-psychology/https://blog.sourceshift.io/p/piaget-for-prompt-agents-why-our-long-form-memory-borrows-from-constructivist-psychology/Composing CAM + CAMEL + FadeMem so a book-writing agent has structured memory, healthy decay, and no quiet bias amplification.Sat, 16 May 2026 16:00:00 GMTSubagents as a context-budget primitivehttps://blog.sourceshift.io/p/subagents-as-a-context-budget-primitive/https://blog.sourceshift.io/p/subagents-as-a-context-budget-primitive/A subagent is not a workflow node. It is a budget envelope. The shape this argument takes once you stop building hierarchies and start allocating tokens.Fri, 15 May 2026 08:00:00 GMTTwo prompt frameworks, one runtime: how we adopted BAML without giving up our cost ledgerhttps://blog.sourceshift.io/p/two-prompt-frameworks-one-runtime-how-we-adopted-baml-without-giving-up-our-cost-ledger/https://blog.sourceshift.io/p/two-prompt-frameworks-one-runtime-how-we-adopted-baml-without-giving-up-our-cost-ledger/BAML wants to own the wire. Our harness already does. We ran them side-by-side in "modular mode": BAML for render and parse, the harness for resolution, telemetry, and cost. Here is why and how — and why the 2026 burden-allocation literature says it was the principled choice, not a pragmatic compromise.Wed, 13 May 2026 10:23:39 GMTWhat 170 papers agreed on about deep research agentshttps://blog.sourceshift.io/p/what-170-papers-agreed-on-about-deep-research-agents/https://blog.sourceshift.io/p/what-170-papers-agreed-on-about-deep-research-agents/Five surveys, one consensus shape: a four-stage pipeline, three taxonomy splits, six recurring failure modes. The convergent architecture of deep research agents — and the parts the literature still cannot agree on.Sun, 10 May 2026 12:00:00 GMTMini-ork: A year of autonomous parallel feature delivery on a solo-founder codebasehttps://blog.sourceshift.io/p/mini-ork-a-year-of-autonomous-parallel-feature-delivery-on-a-solo-founder-codebase/https://blog.sourceshift.io/p/mini-ork-a-year-of-autonomous-parallel-feature-delivery-on-a-solo-founder-codebase/How a small orchestration loop wrapped around Claude Code grew into a multi-track delivery system with measurable cost, reliability, and throughput wins — anchored in the 2026 multi-agent literature.Mon, 04 May 2026 14:10:32 GMTProbe before dispatch: the routing pattern we built without knowing it had a namehttps://blog.sourceshift.io/p/probe-before-dispatch-the-routing-pattern-we-built-without-knowing-it-had-a-name/https://blog.sourceshift.io/p/probe-before-dispatch-the-routing-pattern-we-built-without-knowing-it-had-a-name/Five months of manual probing turned into seven shipped features. The literature had a name for what we were doing. We missed it for half a year.Sat, 02 May 2026 09:00:00 GMTOur prompt canary was lying to ushttps://blog.sourceshift.io/p/our-prompt-canary-was-lying-to-us/https://blog.sourceshift.io/p/our-prompt-canary-was-lying-to-us/A 5% A/B that hid a 1.8× cost regression, and the two 2026 papers that named the fix: multi-objective Thompson sampling with a calibration gate.Fri, 24 Apr 2026 12:43:05 GMTThe paper that proved our 5 lines of code were optimalhttps://blog.sourceshift.io/p/the-paper-that-proved-our-5-lines-of-code-were-optimal/https://blog.sourceshift.io/p/the-paper-that-proved-our-5-lines-of-code-were-optimal/A 2025 paper formalized our 8-tier style profile chain as a laminar matroid. The greedy resolver we already had was the right answer.Wed, 22 Apr 2026 10:45:36 GMTWe stopped treating context like application logichttps://blog.sourceshift.io/p/we-stopped-treating-context-like-application-logic/https://blog.sourceshift.io/p/we-stopped-treating-context-like-application-logic/Six tables, three plug-in layers, one compose call. The substrate every block-shaped feature on LibWit now plugs into — and the reversibility lens that decided what to lock on day one.Wed, 15 Apr 2026 10:00:00 GMTOur prompts stopped being codehttps://blog.sourceshift.io/p/our-prompts-stopped-being-code/https://blog.sourceshift.io/p/our-prompts-stopped-being-code/How we built a prompt harness — registration, version control, four-tier resolution, execution ledger, feedback loop — and what the 2026 literature has been calling the same shape.Mon, 05 Jan 2026 17:10:40 GMTThe simplest survivable form of chat memoryhttps://blog.sourceshift.io/p/the-simplest-survivable-form-of-chat-memory/https://blog.sourceshift.io/p/the-simplest-survivable-form-of-chat-memory/Two prompts, one Postgres column, a 6-message threshold. How our chat sessions keep coherence past the context window without hierarchical buffers or vector search.Sun, 04 Jan 2026 19:00:00 GMTWe built attractor-basin memory before the paper named the problemhttps://blog.sourceshift.io/p/we-built-attractor-basin-memory-before-the-paper-named-the-problem/https://blog.sourceshift.io/p/we-built-attractor-basin-memory-before-the-paper-named-the-problem/Mid-2025: context management for LLM agents was a vector DB plus message-history glue. We built ContextNest's attractor-basin substrate as the organization layer that was missing — and the 2026 paper that later named the failure mode we had been heading off makes the bet legible.Sun, 20 Jul 2025 09:14:38 GMT