Agent Memory Architecture: Working, Episodic, Semantic
"Agent memory" is one of the most overloaded terms in the field. People mean radically different things: a chat-history buffer, a vector store of past sessions, a fact graph, or some custom hybrid. This matters because picking the wrong memory shape for the wrong job is the most common reason agents that demo well don't ship.
The cleanest mental model borrows from cognitive science. The field has converged on three or four memory types that map fairly directly to what your stack needs.
The four memory types
Working memory is the current conversation context — the messages, system prompt, and anything you've explicitly stuffed into the LLM call. It's bounded by the model's context window and is the cheapest, fastest memory layer. Most "memory bugs" people report are actually working-memory truncation problems.
Procedural memory is the system prompt and decision logic that defines how the agent behaves. It's typically static or version-controlled, not learned at runtime. Treat it as code, not data.
Semantic memory holds general factual knowledge: user preferences ("Anuj prefers Python"), domain entities ("the project uses PostgreSQL 16"), and stable relationships. It changes slowly and is best stored in a structured store (Postgres, Neo4j) with explicit schemas.
Episodic memory is timestamped records of past interactions — "the user reported the bug on Tuesday and we tried fix X." Vector databases are the standard backing store, retrieved by semantic or hybrid search.
The most-cited live example is MemGPT / Letta, which models the agent's memory as an OS: in-context "core memory" (RAM-like), "recall" memory (the conversation database), and "archival" memory (long-term searchable storage). The agent uses tool calls to read, write, and migrate between layers — what Letta calls "agentic memory control."
Why the distinction matters
The frequent mistake is to dump everything into one vector store. Three failure modes follow:
Splitting episodic from semantic forces you to answer the right question for each store: episodic is "what happened, when?" semantic is "what is currently true?"
Write policies
The write step is the part most teams underbuild. Common policies:
A mature memory layer separates store (write a candidate memory), update (merge into an existing one), and ignore (filler / questions) as distinct decisions, not a single "save everything" path.
Read policies
On the read side, the rule is "retrieve once, retrieve narrowly." The default of dropping the top-10 episodic chunks into context every turn does more harm than good past short conversations. Better patterns:
recall_memory(query) as an explicit tool. The model decides when to call it. This is the Claude Code / Letta pattern.Conflations to avoid
A few traps:
A reasonable starting stack
For most production agents in 2026, this stack is enough:
user_preferences and entities — written by a reflection step at end-of-sessionrecall(query) tool the agent calls when it needs prior contextYou can layer Letta on top later if you outgrow it; you almost certainly won't need to.
