llm-ops ·agent-memory ·constructivist ·long-form-generation ·arxiv-research

Piaget for prompt agents: why our long-form memory borrows from constructivist psychology

May 16, 2026 · 9 min read

For a one-shot LLM call, “memory” is whatever you fit in the context window. For an agent that writes a 200,000-word book across weeks, with feedback loops and re-runs and the occasional half-rewrite, the context window stops being a memory at all. What you have instead is a problem that looks a lot like the one developmental psychology has been studying for ninety years: how does a mind build up structured knowledge from a messy stream of experience, and how does it keep that knowledge from rotting?

Jean Piaget’s answer was constructivism. Knowledge is not copied in from the world; it’s actively built, organized into schemas, assimilated when new information fits, and accommodated (the schemas themselves restructure) when it doesn’t. A 2025 paper from a Beijing group caught our attention because it took this framing literally. The architecture they call CAM — Constructivist Agentic Memory (2510.05520) — designed an agent memory system around three Piagetian properties: structured schemata, flexible assimilation, and dynamic accommodation. We’ve been building toward something like this for our book generation pipeline for months, and CAM gave us the spine to organize the rest around.

This is the design we’re shipping, the four other 2026 papers that fill the gaps CAM leaves open, and where the implementation goes beyond the prototype.

Why constructivism, specifically

The alternative architectures don’t survive their own scaling — which is the honest answer to why this memory system reaches back to a 1936 epistemology.

Flat memory does not scale: dense retrieval treats every memory as an island, and at a hundred thousand entries embedding noise starts winning over signal. A 2026 paper called CLAG (2603.15421) measured this for small language models and found knowledge dilution becomes catastrophic by the time the store gets non-trivial.

Two-panel scatter-plot diagram of knowledge dilution in dense retrieval. Left panel A FEW THOUSAND MEMORIES shows a sparse vector space with about thirty sage-green dots, an amber query dot at the center, and a dashed circle around three sage-green nearest neighbors labeled top three neighbors all correct. Caption: signal wins. Right panel A HUNDRED THOUSAND MEMORIES shows the same vector space now packed with hundreds of grey noise dots; the same amber query has a dashed circle around three nearest neighbors that are now mostly grey noise dots, labeled top three neighbors mostly noise. Caption: noise wins. Bottom caption: same query, same vector space, different store size, different answer. — Same query, same vector space. The store grew and the cosine-nearest set changed shape with it. The answer the system retrieves on the right looks right and isn't.

Most hierarchical schemes are static — built once and queried forever — which is the problem even a unified formalism cannot hide. Group similar memories into clusters, summarize the clusters, summarize the summaries — the way most long-context agentic systems handle it. Talebirad et al’s recent theory paper (2603.21564) gave the genre a unifying formalism. When new information arrives that doesn’t fit, the system either jams it into the wrong cluster (assimilation gone wrong) or simply files it as a new orphan (accommodation never happens). Bi-Mem (2601.06490) named this failure mode for personalized memory: clustering noise gets amplified, and locally aggregated mistakes propagate.

Where Bi-Mem named the problem, CAM reframes the question from “how do we cluster memories?” to “what properties does a memory system need so that it keeps working as the world the agent operates in keeps changing?” Three properties, taken straight from Piaget:

Structured schemata — memories live inside organized scaffolds, not in a flat pool.
Flexible assimilation — incoming experience finds the right scaffold without forcing a fit.
Dynamic accommodation — when something genuinely new arrives, the scaffolds themselves restructure.

Triangular diagram of the three Piagetian properties of constructivist memory. Top tile STRUCTURED SCHEMATA — memories live inside organized scaffolds, not a flat pool. Lower-left tile FLEXIBLE ASSIMILATION — incoming experience finds the right scaffold without forcing a fit. Lower-right tile DYNAMIC ACCOMMODATION — scaffolds themselves restructure when something genuinely new arrives. Center hexagonal icon with caption: a working memory must hit all three. Curved sage arrows connect the three tiles in a cycle. — Three properties, all required. Drop any one and the system stops being constructivist — it becomes a flat pool with extra steps.

CAM’s contribution is an incremental overlapping clustering algorithm that delivers all three. Memories get grouped, regrouped, and re-summarized as the corpus grows. Retrieval activates the relevant scaffold rather than ranking a flat pool. The whole thing supports both coherent hierarchical summarization and online batch integration, which is exactly the read/write shape a long-form generation agent needs.

That’s the spine. But CAM-the-paper is a prototype, and four papers from the past year name the gaps it leaves open. The interesting work was composing the lot. CAM gives us the spine, but structured schemata and incremental clustering are abstract properties until traced through the actual write-gate, cluster, decay-score, and scaffold-activation path the pipeline runs on every observation.

How a memory lands and how it comes back

flowchart TD
  WRITE([New observation]) --> TYPED[BAML-typed extraction<br/>not free-form text]
  TYPED --> CALIB{calibrator:<br/>bias risk?}
  CALIB -- low --> SCHEMA[match into existing schema]
  CALIB -- high --> QUARANT[quarantine + review]
  SCHEMA --> CLUSTER[incremental<br/>overlapping cluster]
  CLUSTER --> STORE[(episodic / semantic /<br/>skill store)]

  STORE --> DECAY{score below<br/>floor?}
  DECAY -- yes --> DELETE[FadeMem weekly delete]
  DECAY -- no --> KEEP[keep]

  QUERY([Generation needs context]) --> ACTIVATE[activate scaffold<br/>by query]
  ACTIVATE --> CALIB2{retrieval calibrator:<br/>spurious match?}
  CALIB2 -- ok --> INJECT[inject into prompt]
  CALIB2 -- flagged --> DROP[drop from context]
  STORE --> ACTIVATE
  INJECT --> OUT([response])
  OUT --> WRITE

  classDef gate fill:#7a5a1f,stroke:#fff,color:#fff
  classDef alloc fill:#1f3a7a,stroke:#fff,color:#fff
  classDef serve fill:#1f5e3a,stroke:#fff,color:#fff
  classDef store fill:#4a1f7a,stroke:#fff,color:#fff
  class CALIB,CALIB2,DECAY gate
  class TYPED,SCHEMA,CLUSTER,ACTIVATE,INJECT alloc
  class STORE,DELETE,KEEP store
  class WRITE,QUERY,OUT,QUARANT,DROP serve

The blue spine is CAM; the yellow gates and the purple decay path are the additions, and which paper closes which failure mode is the next section.

The four gaps and the four papers that close them

Gap 1: spurious correlation amplification. When an LLM writes its own memory, it inherits the same biases it had when generating the original response. Store enough of those and the bias compounds. The CAMEL paper (2605.09330) names this as the dominant failure mode for self-rewriting memory systems and prescribes a calibrator that scores each candidate memory for bias risk at write time. Our implementation wires this twice: once at the write gate (refuse to store memories that score above a threshold for likely spurious correlation) and once at the read gate (drop retrieved memories whose pattern matches a known false positive). CAM has no such guard. Without one, the failure mode that Useful Memories Become Faulty (2605.12978) documents — an agentic memory bank that LLMs continuously update and that quietly corrupts over time — shows up on a timescale measured in months, not years.

Gap 2: nothing ever leaves. CAM grows; it does not shrink. For an agent that writes books, this means the memory store from a January book about Roman naval logistics is still active when the May book is on transformer architectures. The cross-talk is real. The FadeMem paper (2601.18642) takes the Ebbinghaus forgetting curve seriously and proposes a biologically-inspired decay schedule that deletes low-score memories on a cadence. Our weekly decay cron does the same thing for our memory stores, scoped per book and per user. The Oblivion follow-up (2604.00131) shows that decay-driven reactivation outperforms always-on retrieval on long-running agents; we expect similar gains once we have enough longitudinal data to measure it.

Gap 3: memories get stale, not just old. Staleness is different from age. A memory that “the user prefers dense academic prose” is fresh on day one and wrong on day ninety, because the user told you in March they wanted something lighter. The STALE benchmark (2605.06527) calls this implicit conflict: a later observation invalidates an earlier memory without explicit negation. Our calibrator does double duty here. Same write-time scoring that catches bias also flags memories whose content directly contradicts a memory written within the last 24 hours. The contradicted older memory gets demoted, not deleted, so the audit trail survives if the user changes their mind back.

Gap 4: free-form text is the wrong storage primitive. Almost every agentic memory system stores memories as natural-language sentences. The Schema-Constrained Generation paper (2604.20117) makes the case that this is constructivism in name only: real schemata are typed structures with constrained fields, not prose that an embedding has to disambiguate later. Our memory writes pass through BAML-typed extractors first. The store holds typed records (episodic events with timestamps and entities; semantic facts with sources and confidences; skills with trigger predicates and postconditions) rather than dense sentences. Retrieval is structural, not just semantic, which means it survives the kinds of paraphrase that ruin dense-only systems.

Architecture composition diagram. Central vertical pillar labeled CAM (CONSTRUCTIVIST AGENTIC MEMORY) holds the spine: incremental overlapping clusters, structured schemata, dynamic accommodation. Four horizontal arms branch off from the pillar — top-left CAMEL — bias risk calibrator at write and read gates; top-right FadeMem + Oblivion — Ebbinghaus-style decay, weekly delete; bottom-left STALE — implicit conflict detector, contradictions demote not delete; bottom-right Schema-Constrained Generation — typed records, not free-form text. Caption: spine plus four guards, each closing a failure mode CAM leaves open. — The architectural picture. CAM in the middle, four papers on the outside. None of the four guards work without the spine; the spine without the guards corrupts in months.

Each of the four closed gaps implies a measurable property — bias flag rate flat, cross-chapter references accurate, new books starting warm from prior episodic stores — and a quarter of production data is enough to confirm or deny all three.

What we expect this to buy us

Three measurable shifts, and we’ll know if we got them within a quarter:

A new book on a related topic should benefit from prior books’ memories from day one, instead of starting cold. The episodic store carries the “what worked” from prior runs; the semantic store carries the “what’s true” we already paid to extract; the skill store carries the procedural lessons.
Long-form coherence across chapters should improve, because the memory module remembers what was claimed in chapter 3 when it generates chapter 17. We measure this through the existing chapter rubric’s cross-reference consistency score.
Bias amplification, measured by the calibrator’s flag rate on memory writes over time, should stay flat or fall. If it climbs, the calibrator is the canary; the failure mode CAMEL warns about is starting and we have time to intervene.

What we’re shipping is the composition, the per-book per-user scoping, and the calibrator wired at both write and read time — each component is downstream of published work: CAM is the spine, CAMEL the safety layer, FadeMem and Oblivion the decay layer, STALE the staleness detector, Schema-Constrained Generation the storage primitive.

When you’d reach for this

The architecture to avoid slow memory corruption is no longer a research project — if your agent’s memory grows monotonically, holds free-form text, and has no decay or bias-amplification check, the composition described here is the direct fix. The symptom that catches it first is usually not a single memory going wrong: a slow erosion of output quality over weeks, with no single commit or prompt change to blame.