llm-ops ·submodular ·personalization ·arxiv-research

The paper that proved our 5 lines of code were optimal

Apr 22, 2026 · 9 min read

Every time a value can come from more than one place, an argument starts about which place wins. Settings resolvers, feature flag stacks, permission cascades, multi-tier configuration overrides: someone on the team has to remember the precedence order and explain it at every standup. The if-else chain that encodes the order is correct, but nobody can name why it is correct.

If you have shipped any agent with multiple sources of any conditioning signal (system prompts, tool palettes, memory retrieval policies, per-call configuration), you have this problem.

Inside LibWit, our version of it looks like an eight-tier chain that picks one style profile per LLM call: block override, category override, parent-chain inheritance, document setting, user global, a semantically similar profile from elsewhere in the corpus, a curated exemplar, the system default. For a year we ran it as the hand-rolled fallback chain you would expect, and every time someone proposed adding a tier the argument restarted about where it fit. Then a December 2025 paper proved the cascade was already optimal, and named the algorithm we had been writing without noticing.

Where this problem shows up in agentic AI

The “many candidates, one winner, complicated precedence” problem is everywhere once you start looking. The names are different but the shape doesn’t change.

System concern	The tiers that compete	A framework where you’ve seen it
System prompt assembly	developer instructions, user instructions, conversation-pinned context, current tool-call context, runtime persona	Anthropic Claude API’s developer/user message hierarchy; OpenAI Assistants v2’s assistant + thread + run instructions
Tool palette selection	globally enabled tools, role-restricted tools, user-disabled tools, capability-matched tools, scope-limited tools per turn	CrewAI’s role+tool binding; AutoGen’s tool registries; LangGraph node-level tools
Memory retrieval policy	episodic memory, semantic memory, skill memory, conversation buffer, scratchpad notes, knowledge graph	Letta (formerly MemGPT)‘s memory blocks; Mem0’s hierarchical stores; Cognee’s tiered memory
Per-call LLM configuration	model choice, temperature, max tokens, response format, safety mode (each can come from per-task, per-user, per-tenant, or system default)	OpenAI Assistants v2’s instruction cascade; LangChain `RunnableConfig` merging
Safety filter stack	constitutional rules, developer-supplied policies, user opt-outs, jurisdiction-specific filters, post-hoc moderation	Anthropic’s Constitutional AI layers; Llama Guard / NeMo Guardrails composition
Multi-agent persona routing	role-restricted dispatch, capability-required dispatch, cost-tier preference, fallback agent	CrewAI’s hierarchical process; AutoGen’s GroupChat; LangGraph supervisor patterns

The pain reads the same way in all six. Adding a new layer kicks off an argument about where it sits in the precedence order. Edge cases produce surprising winners (“why did the assistant-level instruction lose to the run-level one?”). Adding observability is a project, because nothing in the cascade is named; the if-else chain doesn’t know it’s a cascade. The cascade pattern is universal and universally unnamed, which means every team has reached for a solution independently, and the range of answers is wide enough to be worth mapping.

How others handle it today

A quick survey of approaches in the wild, ordered roughly by formality.

Approach	Where you’ll see it	Strength	Weakness
Ad-hoc if-else cascade	most production agents	trivial to write, easy to read	every change is an argument; no proof the order is correct; no observability of which branch wins
Dict-merge with right-overrides-left	LangChain `RunnableConfig`, OpenAI Assistants instruction merge	one line of code; works for additive fields	wrong for any field where the “more specific” source isn’t always the right winner; silently loses information
Per-field reducers	LangGraph state reducers; Redux-style stores	explicit semantics per field; testable	combinatorial complexity as fields grow; no shared framework for “this field’s reducer is a precedence chain”
Importance + recency ranking	Letta memory blocks; many RAG ranking pipelines	adapts to access patterns; one knob to tune	heuristic, no optimality story; sensitive to embedding noise as the store grows
Dense retrieval over the union	Mem0, vanilla RAG	uniform code path; scales with embeddings	precedence dissolves into similarity; can’t express “user-level beats document-level even at lower cosine”
DAG-based config composition	Dagger, Nix, Bazel rules	formal, reproducible, type-safe	heavyweight; only worth it when config is itself the product
Constitutional layers	Anthropic Claude’s safety stack	strict precedence is the point; transparent to the user	domain-specific; not a general resolution framework

The structure, a laminar matroid, has been sitting there unnamed: most of the agentic-AI world is still running ad-hoc if-else cascades, and even the frameworks that are explicit about precedence (Letta, the Anthropic safety stack) don’t cite the underlying mathematics.

The paper that names it

The setup Platnick, Alirezaie, and Rahnama describe in Structured Personalization: Modeling Constraints as Matroids for Data-Minimal LLM Agents is the same one we had. Personalizing an agent means picking which user-specific facts to condition on. The utility of more facts diminishes (this is the submodular property, where each new fact helps less than the last). But the choice is constrained: some facts depend on others, some categories cap how many can be selected, some hierarchies allow at most one child per parent.

The paper’s contribution is the recognition that this whole space of constraints (quotas, hierarchies, dependencies) has a name. It’s a laminar matroid, a structure mathematicians have been studying since the 1930s. Once you recognize that, three things follow:

Greedy works. Sort the candidates by marginal utility, pick the first one that doesn’t violate any constraint, repeat. Plain greedy gives you at least 1/2 of the optimal. The continuous-greedy variant gives you 1 − 1/e ≈ 63.2%.
The proof is older than most of CS. Edmonds wrote the matroid-greedy theorem in the 1960s, and submodular maximization is a well-trodden field. We didn’t have to prove anything about our resolver; we just had to recognize what it was.
The constraint is the load-bearing part of the design, not the algorithm. Most engineering effort should go into modeling the constraints (which tier can override which, which fallbacks are valid) rather than into clever resolution logic.

The third point is the one that changed how we think about the resolver.

How we applied it: 8 tiers, in priority order

We took the framing and pointed it at the chain we already had, which by then had grown to eight tiers. The priorities are integers; lower wins. The flowchart below maps these priorities to color-coded paths.

flowchart TD
  REQ([style request]) --> PARALLEL[parallel lookups<br/>5 base tiers]
  PARALLEL --> T1[1. block override]
  PARALLEL --> T2[2. category override]
  PARALLEL --> T3[3. parent-chain walk]
  PARALLEL --> T4[4. document setting]
  PARALLEL --> T5[5. user global]

  T1 & T2 & T3 & T4 & T5 --> ANY{any non-null?}
  ANY -- yes --> PICK[priority-sort, pick first]
  ANY -- no --> LAZY[lazy fallback path<br/>embed selected text]
  LAZY --> T6[6. semantic neighbor<br/>user's own past profile]
  LAZY --> T7[7. curated exemplar]
  T6 & T7 --> GATE{exemplar similarity<br/>>= 0.55?}
  GATE -- yes --> PICK
  GATE -- no --> T8[8. system default]
  T8 --> PICK
  PICK --> OUT([resolved profile])
  PICK --> METRIC[(record tier_win,<br/>resolution_duration)]

  classDef gate fill:#7a5a1f,stroke:#fff,color:#fff
  classDef alloc fill:#1f3a7a,stroke:#fff,color:#fff
  classDef serve fill:#1f5e3a,stroke:#fff,color:#fff
  classDef store fill:#4a1f7a,stroke:#fff,color:#fff
  class ANY,GATE gate
  class PARALLEL,PICK,LAZY alloc
  class REQ,OUT serve
  class T1,T2,T3,T4,T5,T6,T7,T8 store
  class METRIC alloc

The blue is the algorithm. The yellow is the constraint logic (which the paper says is the load-bearing part). The purple is the candidate pool. The green is what the caller actually receives.

Two things in this diagram weren’t there before the matroid framing made them obvious:

The five base tiers are independent lookups, so they run in parallel. We were doing them sequentially, accepting whichever fired first. The matroid framing says they’re all candidates for the same selection, so why wait?
The two fallback tiers (semantic neighbor and exemplar) are expensive (they need an embedding pass), so they run only when all five base tiers miss. The “lazy” path is one branch, not five.

With five independent lookups and a single lazy gate holding the two expensive tiers, the candidate pool already knows its own shape, and when the constraints are that clean, the selection code is almost nothing.

The algorithm fits in five lines

Once the tier chain is modeled correctly, the resolver is the smallest function in the file. Sort by priority. Walk the sorted list. Return the first non-null profile. Mark it selected for the audit trail.

function resolveGreedy(tiers, purpose) {
  const sorted = [...tiers].sort((a, b) => a.priority - b.priority);
  for (const tier of sorted) {
    if (tier.profile !== null) {
      tier.selected = true;
      return tier.profile;
    }
  }
  return getDefaultProfileForPurpose(purpose);
}

Under a laminar matroid (at most one profile per tier, with strict precedence) greedy is provably optimal; the paper’s general-case constant-factor bound collapses to “exactly optimal” for our trivial-cardinality case.

What we got out of the framing

The framing changed the conversations around the resolver, not the code; the 5-line function above was almost word-for-word what we already had.

Adding a new tier became a constraint-modeling exercise, not an algorithm change. When we added the semantic-neighbor tier mid-year, the question stopped being “where in the if-else does this go?” and became “where in the laminar order does it sit, and what’s the at-most-one constraint with respect to its neighbors?” The answer flowed from the model; the code change was a single new entry in the priority array.
The confidence gate became a first-class object. The exemplar tier has a synthesized similarity score that’s not directly comparable to the real cosine matches the semantic-neighbor tier produces. The matroid framing forced us to be explicit about when an exemplar match is admissible at all (similarity ≥ 0.55) versus when it should be skipped to fall through to the default. Before the framing, this was buried in an ad-hoc threshold check inside the lookup function.
Tier wins became a metric, not a debug log. Once each tier is a named element in a formal structure, “how often does each tier win?” is a sensible thing to measure. We now record a counter per tier per resolution, plus a duration histogram. Six weeks of data later, the block-tier win rate is around 4%, the category-tier is around 12%, document around 25%, user-global around 30%, default around 28%, and the two fallback tiers together cover the missing 1%. Numbers like these tell us where to invest in better profile elicitation, and they only became visible because the structure made them name-able.

The tier win rates answered one class of questions the framing opened up; the richer machinery in the paper, continuous greedy over a submodular objective, addresses a class we haven’t needed yet.

What this didn’t solve

One thing we expected the matroid framing to give us, that it didn’t.

What we expected was the headline bound, but our resolver picks exactly one profile (the trivial cardinality case) so the paper’s (1 − 1/e) continuous-greedy bound collapses to “highest-priority non-null wins”; the framing was the contribution, not the math. The richer machinery the paper develops (compiling a user’s knowledge graph into macro-facets, running continuous greedy over the matroid) is not what we do, and we don’t need it for this surface. If we ever want to compose multiple profile fragments instead of picking one, the paper is there.

The detection fingerprint

If your service has a precedence chain with at-most-one-per-tier semantics (a settings resolver, a feature-flag override stack, a permissions cascade, a config merge with strict overrides), you probably have a laminar matroid.

The thing that catches it is the conversation pattern: every time someone proposes adding a new layer to the chain, the discussion is about precedence ordering and override semantics, not about retrieval. That’s the matroid asking to be named. Once you name it, the algorithm stays small and the constraint model gets the attention it deserves.

The five lines were always the right amount of code. We just didn’t have the vocabulary to say so.