Introduction
Overview
Current AI memory solutions face significant scalability challenges. Most rely on "explicit modeling", requiring humans to continuously specify which information is important and which is not. This approach fundamentally limits an AI system's ability to understand what truly matters to each user, and makes it difficult for the system to retain the most critical, user-specific information.
In addition, existing solutions often adopt a "one-size-fits-all" strategy, applying the same memory mechanism across all scenarios. Conversation memory, workspace knowledge, and learned skills are three different treatments of source material with three different read models — forcing them through one pipeline and one store creates coupling, wasted work, and provenance gaps.
What is MemU?
MemU is an agentic memory framework designed for LLMs and AI agents. It organizes memory into three independent lines — chat, workspace, and skill — each fed by its own source, owning its own store, and rendering its own markdown output. All three lines share a single layered kernel: record stores, embedding, hybrid search (embedding + BM25), ranking, and markdown rendering.
Every line transforms its raw sources into human-readable markdown — MEMORY.md for conversations, INDEX.md for workspace files, SKILL.md for execution traces — while keeping fine-grained, searchable items in the store. Every retrieved result can be traced back through "item → document → raw source", so memory stays transparent and auditable.
Three Independent Memory Lines
Each line is end-to-end self-contained: source → ingest → store → read → output. Lines never trigger each other — a new conversation never rebuilds the workspace index or re-synthesizes skills.
| Line | Source | Output | What it captures |
|---|---|---|---|
| Chat | Conversation logs | MEMORY.md | User facts, preferences, events — classified into memory categories |
| Workspace | Workspace files (multimodal) | INDEX.md | Documents, images, audio, video — captioned and indexed |
| Skill | Execution / tool traces | SKILL.md | Reusable skills distilled from agent runs |
The L0 / L1 / L2 Layer Model
Every line runs the same three representation layers, each derived from the one below (L0 → L1 → L2):
| Layer | Role | Description |
|---|---|---|
| L0 | Resource | The raw source — chat corpus, multimodal files, agent-run logs |
| L1 | Document | A coarse, readable document derived from L0 — memory category file, caption paragraph, or skill markdown |
| L2 | Item | Fine slices/extracts of the L1 document — the embed/search unit |
The layers maintain full traceability: every L2 item is a slice of an L1 document, and every L1 document derives from an L0 resource. Retrieval hits L2 items and rolls up to the L1 document (and its L0 resource) for the result — high transparency, interpretability, and robust provenance.
Core Processes
Each line runs the same core processes over its own store:
- Memorization — Runs the line's ingest pipeline:
preprocess(raw L0 → L1 document) →slice/extract(L1 → L2 items) →embed. Fully autonomous, no manual labeling. - Retrieval — A single hybrid pass over the L2 items: cosine embedding similarity and BM25 keyword scores are normalized and fused into one rank; top hits roll up to their L1 document.
- Independent Triggers — Each line watches its own source manifest. A change under one line's source rebuilds only that line.
MemU vs. Traditional Memory Systems
Different memory approaches serve different purposes. Here's how MemU compares:
| Feature | Traditional Systems | MemU |
|---|---|---|
| Structure | One store for everything | Three independent lines, each with its own store and markdown output |
| Memory Formation | Explicit modeling (manual) | Autonomous per-line ingest pipelines |
| Retrieval | Embedding search only | Hybrid: embedding + BM25 keyword, fused in one pass |
| Multimodal | Limited support | Full multimodal (text, image, audio, video) |
| Traceability | No | Full item → document → source roll-up |
| Change Handling | Global rebuilds | Per-line manifest diff — only the affected line rebuilds |
| Skill Memory | No | Dedicated skill line with its own provenance and deletion |
Hybrid Retrieval (Embedding + BM25)
All three lines retrieve the same way — one shared mechanism, no per-line forks:
- Embedding score — Cosine similarity between the query vector and each L2 item's embedding captures semantic matches
- BM25 score — Keyword ranking over the same items captures exact-term matches (identifiers, names, error codes)
- Fusion — Both scores are min-max normalized across candidates and fused into a single rank; top items roll up to their L1 document
This is a standard, cheap, single-pass design: no graph traversal, no multi-hop loops, no per-query LLM ranking cost. It covers both semantic and exact-term queries while staying fast and deterministic.