The AI Agent Memory System I Built Was Broken — Here Is the Redesign — Maksym Kuzmitskyi

A few weeks ago I wrote about giving your AI agent a persistent memory — a simple system of markdown files the agent reads and writes across sessions. I was excited about it. It worked. I rolled it out on my personal project, recommended it to colleagues, and eventually submitted a pull request adding the memory system to our main company repository.

That PR triggered something I did not expect.

A manager reached out and suggested organizing a meeting — not to approve or reject the PR, but because several people at the company had already been experimenting with similar ideas independently. Before the meeting even happened, there was a chat where people started sharing their experiences. Engineers who had been quietly running local memory systems of their own, not as a global company-wide thing, just something they had hacked together for themselves. They shared what worked and, more importantly, what did not.

What I heard in that chat changed how I think about this entirely.

What Broke

The first version of the system had one rule: append. Learned something? Append it. Fixed a bug? Append it. Session ended? Append a summary. Append, append, append.

This works fine for the first few days. Then the files get long. Then they get contradictory. Then they become a liability.

One story from the chat stuck with me. A colleague had been working in a long session — the kind where you get deep into something and lose track of time. The agent was faithfully logging each action to a sessions file, just as instructed. Then midway through, he asked it to do something simple. Small task. Nothing complicated.

The agent picked up the session logs, scanned the history, and concluded that several things from earlier in the session had not been completed yet. So it started doing them. All of them. On top of the actual task. Everything tangled together. The agent was trying to be helpful by using the context it had been given — and it made a mess precisely because of that context.

He rolled everything back and started a fresh session. Simple task, no history, done in two minutes.

That story captures the failure mode better than any metric I could give you. The agent did exactly what you told it to do. It dutifully loaded the history, applied it to the task, and produced chaos. The problem was not the agent. The problem was that we had built a system that grew without any mechanism for forgetting.

Here is what we also observed across different setups:

Context explosion. lessons.md hit 200 lines in three weeks. The agent was loading all of it into context on every session. Most of it was stale or irrelevant to the current task. We were paying context budget for noise.

Drift and contradictions. Decisions change. The agent would append a new decision without removing the old one. Now both exist. Which is current? Depends on where in the file the agent looks.

Entropy wins. The longer a system runs without maintenance discipline, the worse it gets. Append-only systems drift toward entropy by design.

We realized the core mistake: we treated memory like a log, not like a knowledge base.

The Redesign Principles

The meeting itself was candid in a way that work meetings rarely are. People with months of hands-on experience with these systems, and everyone had the same conclusion: the idea is right, but the implementation costs too much to maintain. The systems degraded. They required manual cleanup. They occasionally made things worse. The good news was that everyone had independently arrived at similar fixes — which meant the fixes were probably right.

The rewrite started with constraints rather than architecture. We asked: what properties does memory need to have to not degrade?

Memory is not a log. Do not store what happened. Store what matters. A session where nothing changed — nothing gets written.

If memory grows continuously, it is a bug. Not a feature, not normal accumulation. A symptom of missing update rules.

Overwrite, do not append. When knowledge changes, replace the old entry. The file should represent current state, not a history of states.

Load index first, fetch content lazily. The agent should never blindly load all memory into context. It should load a small index, decide what is relevant, then fetch only that.

One source of truth per topic. No two entries should describe the same thing differently. If they do, one is wrong and should be removed.

The New File Structure

.ai-memory/
  index.md          — What exists and when to load it
  state.md          — Current project state (short, always overwritten)
  decisions.md      — Architecture and product decisions (overwritten, not appended)
  facts.md          — Stable technical facts (rarely changes)
  tasks.md          — Active tasks only (no completed tasks)

Five files. No sessions folder. No history.

Notice what is gone: the sessions directory. We removed it. It was generating the most noise for the least value. If you need an audit trail, use git — that is what git is for. Memory is not git.

index.md

This is the only file the agent loads unconditionally at the start of a session. Everything else is fetched on demand.

# Memory Index

## Always load
- state.md — current project state, active focus

## Load when relevant
- decisions.md — when making architecture or tech stack decisions
- facts.md — when working with the API, data schema, or deployment
- tasks.md — when planning work or checking what is in progress

## Topics → files
- auth flow → facts.md#authentication
- API endpoints → facts.md#api
- styling rules → decisions.md#styling
- current sprint → tasks.md

The index is a routing table. The agent reads it, decides what it needs, loads only that.

state.md

One file, always overwritten. Never appended.

# Current State

Updated: 2026-04-11

## Active focus
Redesigning the memory system after observing degradation in production.
Building the hardened v2. Targeting minimal file count and no append logic.

## Recent context
- Completed: initial audit of degraded memory files
- Completed: team discussion and principles alignment
- In progress: implementing new structure across active projects

## Known blockers
None currently.

This is a snapshot, not a timeline. Next session, the agent rewrites it. Old state disappears. That is by design.

decisions.md

Decisions are updated in place. When a decision changes, the entry is rewritten — not a new entry added below.

# Decisions

## Styling approach
**Decision:** Tailwind CSS utility classes only.
**Reasoning:** Utility-first is fast for a small team, eliminates CSS naming arguments.
**Updated:** 2026-03-10

## Blog post storage
**Decision:** TypeScript files with markdown in template literals.
**Reasoning:** Type-safe content, no CMS needed, works with SSG.
**Updated:** 2026-03-15

Note the Updated field on every entry. This is the anti-drift mechanism. When you see a date from six months ago, you know to verify it is still current. When you update a decision, you update the date. Stale dates are a signal to review.

facts.md

Stable technical facts that rarely change. API shapes, schema, deployment configuration, environment behavior. Written once, updated when things change.

# Facts

## Authentication
- JWT tokens, 15-minute expiry
- Refresh tokens stored in httpOnly cookies
- Endpoint: POST /api/auth/refresh

## Deployment
- Static export via GitHub Actions
- Deployed to GitHub Pages
- Custom domain: ma-x.im
- Build command: next build (output: export)

## Data schema
- Blog posts: TypeScript files in src/data/posts/
- Each post exports { post: BlogPost }
- Registered in src/data/posts/index.ts

tasks.md

Active tasks only. Completed tasks are deleted, not archived.

# Active Tasks

## In progress
- Implement hardened memory structure across ma-x.im project

## Blocked
(none)

## Next
- Write documentation for the new structure
- Review and prune decisions.md across older projects

---
No completed tasks in this file. Completed = deleted.

This is a working board, not a history. Once a task is done, it disappears.

Update Rules: When to Write and When to Stay Silent

This was the most important thing we formalized. The first system had vague rules: "record things when appropriate." Agents interpreted this as "record everything always." The new rules are explicit about silence:

Write when:

A decision was made that did not exist before
An existing decision was changed
A new stable fact was discovered
The current focus shifted significantly
A bug was encountered that reveals a non-obvious constraint

Do NOT write when:

You completed a routine task (the task file handles this, no memoir needed)
You repeated something already in memory
The session produced no new knowledge
You would only be adding a timestamp without changing content

Overwrite rules:

state.md — always overwrite entirely
tasks.md — remove completed tasks, add new ones
decisions.md — edit the existing entry, update the date
facts.md — edit in place, same as decisions

If you cannot identify which existing entry to update — stop and check if the new information belongs in memory at all.

Consolidation: What We Called "Dreaming"

One thing we kept from the meeting discussion was the idea of an explicit consolidation step. Call it memory maintenance, pruning, or — more evocatively — dreaming. The agent is not doing real work during this step. It is reviewing its own memory for quality.

The consolidation prompt looks like this:

Review all memory files. For each file:
1. Remove entries that are no longer current or relevant
2. Merge entries that describe the same thing
3. Resolve contradictions — keep the most recent or most accurate entry
4. Compress verbose entries into concise ones
5. Check that facts.md and decisions.md have Updated dates
6. Verify tasks.md contains only active tasks

Do not add new information during consolidation.
Output a summary of what was changed and why.

Run this manually every two to four weeks, or when memory files feel like they are growing again. It takes a few minutes and resets the quality level. Think of it as code review, but for your agent's knowledge.

Loading Strategy: Index First, Lazy Fetch

The old system: agent reads everything, every session. Expensive and noisy.

The new system: agent reads index.md first, then decides what to load.

Pseudo-logic the agent follows:

session_start:
  load index.md
  load state.md  (always — it is small and always relevant)

  if task involves technology decisions:
    load decisions.md
  
  if task involves API, schema, or deployment:
    load facts.md
  
  if task involves planning or sprint work:
    load tasks.md

session_end:
  if state has changed → overwrite state.md
  if a decision was made or changed → update decisions.md
  if new stable fact was discovered → update facts.md
  if tasks changed → update tasks.md
  if nothing materially changed → write nothing

The index also contains a topic-to-file routing table. If the task is about authentication, the agent looks up "auth" in the index, sees it maps to facts.md#authentication, and loads only that section if the file supports section loading.

Good Memory vs Bad Memory

Here is what the degraded system looked like, and what the hardened version looks like for the same information.

Bad — lessons.md after three weeks:

- Fixed a bug with template literals and backticks - need to escape them
- Tailwind is our CSS approach
- Remember to update the blog index when adding posts
- Spent 2 hours debugging the sitemap — turned out to be a missing trailing slash
- The agent keeps using CSS modules even though we use Tailwind — told it again
- Fixed template literal backtick issue again in a different file
- Reminder: blog posts need to be registered in index.ts
- Auth tokens expire after 15 min — need refresh logic
- AGAIN: escape backticks in template literals

Three separate entries about the same backtick bug. Two reminders about the blog index. This is what append-only looks like in practice.

Good — facts.md with a single consolidated entry:

## Blog post authoring
- Posts are TypeScript files in src/data/posts/, registered in index.ts
- Content is a template literal — backticks inside content must be escaped as \`
- Dates use format: "Apr 11, 2026"

One entry. Everything relevant. Zero repetition.

Honest Trade-offs

This system is better, but it is not free:

You lose history. There is no record of what was decided and then changed. If you care about that trail, decisions.md has Updated dates, but it does not track the previous value. For audit purposes, commit your memory files to version control — git gives you the history.

Consolidation requires effort. The dreaming step will not happen automatically unless you explicitly schedule it. Without it, even this system will degrade eventually — just much more slowly.

The agent still needs good instructions. Vague instructions produce vague memory behavior. The more specific your rules (especially the "do NOT write" cases), the more disciplined the output.

Index routing is manual. The topic-to-file mappings in index.md need to be maintained by hand. A stale index routes the agent to the wrong file. Keep it accurate.

Summary

I submitted that PR thinking I was sharing something solid. The pre-meeting chat humbled me a little — people had hit the same walls I had not hit yet, they just had not written about it publicly.

The honest conclusion from all of that experience: the naive version works until it does not, and when it stops working it does so quietly. You do not get an error. You get an agent that does almost the right thing, confidently, based on stale context you forgot was there. That is harder to debug than a crash.

The hardened version applies one core idea: memory should represent current knowledge, not accumulated history. Everything else follows from that — overwrite instead of append, index-based loading instead of bulk dumps, explicit silence rules, no sessions folder.

The agent does not need a journal. It needs a working knowledge base that stays reliable over time.

Build that instead.

The AI Agent Memory System I Built Was Broken — Here Is the Redesign