Piaget for prompt agents: why our long-form memory borrows from constructivist psychology
For a one-shot LLM call, “memory” is whatever you fit in the context window. For an agent that writes a 200,000-word book across weeks, with feedback loops and re-runs and the occasional half-rewrite, the context window stops being a memory at all. What you have instead is a problem that looks a lot like the one developmental psychology has been studying for ninety years: how does a mind build up structured knowledge from a messy stream of experience, and how does it keep that knowledge from rotting?
Jean Piaget’s answer was constructivism. Knowledge is not copied in from the world; it’s actively built, organized into schemas, assimilated when new information fits, and accommodated (the schemas themselves restructure) when it doesn’t. A 2025 paper from a Beijing group caught our attention because it took this framing literally. The architecture they call CAM — Constructivist Agentic Memory (2510.05520) — designed an agent memory system around three Piagetian properties: structured schemata, flexible assimilation, and dynamic accommodation. We’ve been building toward something like this for our book generation pipeline for months, and CAM gave us the spine to organize the rest around.
This is the design we’re shipping, the four other 2026 papers that fill the gaps CAM leaves open, and where the implementation goes beyond the prototype.
Why constructivism, specifically
The alternative architectures don’t survive their own scaling — which is the honest answer to why this memory system reaches back to a 1936 epistemology.
Flat memory does not scale: dense retrieval treats every memory as an island, and at a hundred thousand entries embedding noise starts winning over signal. A 2026 paper called CLAG (2603.15421) measured this for small language models and found knowledge dilution becomes catastrophic by the time the store gets non-trivial.
Most hierarchical schemes are static — built once and queried forever — which is the problem even a unified formalism cannot hide. Group similar memories into clusters, summarize the clusters, summarize the summaries — the way most long-context agentic systems handle it. Talebirad et al’s recent theory paper (2603.21564) gave the genre a unifying formalism. When new information arrives that doesn’t fit, the system either jams it into the wrong cluster (assimilation gone wrong) or simply files it as a new orphan (accommodation never happens). Bi-Mem (2601.06490) named this failure mode for personalized memory: clustering noise gets amplified, and locally aggregated mistakes propagate.
Where Bi-Mem named the problem, CAM reframes the question from “how do we cluster memories?” to “what properties does a memory system need so that it keeps working as the world the agent operates in keeps changing?” Three properties, taken straight from Piaget:
- Structured schemata — memories live inside organized scaffolds, not in a flat pool.
- Flexible assimilation — incoming experience finds the right scaffold without forcing a fit.
- Dynamic accommodation — when something genuinely new arrives, the scaffolds themselves restructure.
CAM’s contribution is an incremental overlapping clustering algorithm that delivers all three. Memories get grouped, regrouped, and re-summarized as the corpus grows. Retrieval activates the relevant scaffold rather than ranking a flat pool. The whole thing supports both coherent hierarchical summarization and online batch integration, which is exactly the read/write shape a long-form generation agent needs.
That’s the spine. But CAM-the-paper is a prototype, and four papers from the past year name the gaps it leaves open. The interesting work was composing the lot. CAM gives us the spine, but structured schemata and incremental clustering are abstract properties until traced through the actual write-gate, cluster, decay-score, and scaffold-activation path the pipeline runs on every observation.
How a memory lands and how it comes back
flowchart TD
WRITE([New observation]) --> TYPED[BAML-typed extraction<br/>not free-form text]
TYPED --> CALIB{calibrator:<br/>bias risk?}
CALIB -- low --> SCHEMA[match into existing schema]
CALIB -- high --> QUARANT[quarantine + review]
SCHEMA --> CLUSTER[incremental<br/>overlapping cluster]
CLUSTER --> STORE[(episodic / semantic /<br/>skill store)]
STORE --> DECAY{score below<br/>floor?}
DECAY -- yes --> DELETE[FadeMem weekly delete]
DECAY -- no --> KEEP[keep]
QUERY([Generation needs context]) --> ACTIVATE[activate scaffold<br/>by query]
ACTIVATE --> CALIB2{retrieval calibrator:<br/>spurious match?}
CALIB2 -- ok --> INJECT[inject into prompt]
CALIB2 -- flagged --> DROP[drop from context]
STORE --> ACTIVATE
INJECT --> OUT([response])
OUT --> WRITE
classDef gate fill:#7a5a1f,stroke:#fff,color:#fff
classDef alloc fill:#1f3a7a,stroke:#fff,color:#fff
classDef serve fill:#1f5e3a,stroke:#fff,color:#fff
classDef store fill:#4a1f7a,stroke:#fff,color:#fff
class CALIB,CALIB2,DECAY gate
class TYPED,SCHEMA,CLUSTER,ACTIVATE,INJECT alloc
class STORE,DELETE,KEEP store
class WRITE,QUERY,OUT,QUARANT,DROP serve
The blue spine is CAM; the yellow gates and the purple decay path are the additions, and which paper closes which failure mode is the next section.
The four gaps and the four papers that close them
Gap 1: spurious correlation amplification. When an LLM writes its own memory, it inherits the same biases it had when generating the original response. Store enough of those and the bias compounds. The CAMEL paper (2605.09330) names this as the dominant failure mode for self-rewriting memory systems and prescribes a calibrator that scores each candidate memory for bias risk at write time. Our implementation wires this twice: once at the write gate (refuse to store memories that score above a threshold for likely spurious correlation) and once at the read gate (drop retrieved memories whose pattern matches a known false positive). CAM has no such guard. Without one, the failure mode that Useful Memories Become Faulty (2605.12978) documents — an agentic memory bank that LLMs continuously update and that quietly corrupts over time — shows up on a timescale measured in months, not years.
Gap 2: nothing ever leaves. CAM grows; it does not shrink. For an agent that writes books, this means the memory store from a January book about Roman naval logistics is still active when the May book is on transformer architectures. The cross-talk is real. The FadeMem paper (2601.18642) takes the Ebbinghaus forgetting curve seriously and proposes a biologically-inspired decay schedule that deletes low-score memories on a cadence. Our weekly decay cron does the same thing for our memory stores, scoped per book and per user. The Oblivion follow-up (2604.00131) shows that decay-driven reactivation outperforms always-on retrieval on long-running agents; we expect similar gains once we have enough longitudinal data to measure it.
Gap 3: memories get stale, not just old. Staleness is different from age. A memory that “the user prefers dense academic prose” is fresh on day one and wrong on day ninety, because the user told you in March they wanted something lighter. The STALE benchmark (2605.06527) calls this implicit conflict: a later observation invalidates an earlier memory without explicit negation. Our calibrator does double duty here. Same write-time scoring that catches bias also flags memories whose content directly contradicts a memory written within the last 24 hours. The contradicted older memory gets demoted, not deleted, so the audit trail survives if the user changes their mind back.
Gap 4: free-form text is the wrong storage primitive. Almost every agentic memory system stores memories as natural-language sentences. The Schema-Constrained Generation paper (2604.20117) makes the case that this is constructivism in name only: real schemata are typed structures with constrained fields, not prose that an embedding has to disambiguate later. Our memory writes pass through BAML-typed extractors first. The store holds typed records (episodic events with timestamps and entities; semantic facts with sources and confidences; skills with trigger predicates and postconditions) rather than dense sentences. Retrieval is structural, not just semantic, which means it survives the kinds of paraphrase that ruin dense-only systems.
Each of the four closed gaps implies a measurable property — bias flag rate flat, cross-chapter references accurate, new books starting warm from prior episodic stores — and a quarter of production data is enough to confirm or deny all three.
What we expect this to buy us
Three measurable shifts, and we’ll know if we got them within a quarter:
- A new book on a related topic should benefit from prior books’ memories from day one, instead of starting cold. The episodic store carries the “what worked” from prior runs; the semantic store carries the “what’s true” we already paid to extract; the skill store carries the procedural lessons.
- Long-form coherence across chapters should improve, because the memory module remembers what was claimed in chapter 3 when it generates chapter 17. We measure this through the existing chapter rubric’s cross-reference consistency score.
- Bias amplification, measured by the calibrator’s flag rate on memory writes over time, should stay flat or fall. If it climbs, the calibrator is the canary; the failure mode CAMEL warns about is starting and we have time to intervene.
What we’re shipping is the composition, the per-book per-user scoping, and the calibrator wired at both write and read time — each component is downstream of published work: CAM is the spine, CAMEL the safety layer, FadeMem and Oblivion the decay layer, STALE the staleness detector, Schema-Constrained Generation the storage primitive.
When you’d reach for this
The architecture to avoid slow memory corruption is no longer a research project — if your agent’s memory grows monotonically, holds free-form text, and has no decay or bias-amplification check, the composition described here is the direct fix. The symptom that catches it first is usually not a single memory going wrong: a slow erosion of output quality over weeks, with no single commit or prompt change to blame.
Comments
Sign in with GitHub to leave a comment. Threads live on SourceShift/blog-comments — moderated.