Introduction

Overview

Current AI memory solutions face significant scalability challenges. Most rely on "explicit modeling", requiring humans to continuously specify which information is important and which is not. This approach fundamentally limits an AI system's ability to understand what truly matters to each user, and makes it difficult for the system to retain the most critical, user-specific information.

In addition, existing solutions often adopt a "one-size-fits-all" strategy, applying the same memory mechanism across all scenarios. Conversation memory, workspace knowledge, and learned skills are three different treatments of source material with three different read models — forcing them through one pipeline and one store creates coupling, wasted work, and provenance gaps.

What is MemU?

MemU is an agentic memory framework designed for LLMs and AI agents. It organizes memory into three independent lines — chat, workspace, and skill — each fed by its own source, owning its own store, and rendering its own markdown output. All three lines share a single layered kernel: record stores, embedding, hybrid search (embedding + BM25), ranking, and markdown rendering.

Every line transforms its raw sources into human-readable markdown — MEMORY.md for conversations, INDEX.md for workspace files, SKILL.md for execution traces — while keeping fine-grained, searchable items in the store. Every retrieved result can be traced back through "item → document → raw source", so memory stays transparent and auditable.

Three Independent Memory Lines

Each line is end-to-end self-contained: source → ingest → store → read → output. Lines never trigger each other — a new conversation never rebuilds the workspace index or re-synthesizes skills.

Line	Source	Output	What it captures
Chat	Conversation logs	`MEMORY.md`	User facts, preferences, events — classified into memory categories
Workspace	Workspace files (multimodal)	`INDEX.md`	Documents, images, audio, video — captioned and indexed
Skill	Execution / tool traces	`SKILL.md`	Reusable skills distilled from agent runs

The L0 / L1 / L2 Layer Model

Every line runs the same three representation layers, each derived from the one below (L0 → L1 → L2):

Layer	Role	Description
L0	Resource	The raw source — chat corpus, multimodal files, agent-run logs
L1	Document	A coarse, readable document derived from L0 — memory category file, caption paragraph, or skill markdown
L2	Item	Fine slices/extracts of the L1 document — the embed/search unit

The layers maintain full traceability: every L2 item is a slice of an L1 document, and every L1 document derives from an L0 resource. Retrieval hits L2 items and rolls up to the L1 document (and its L0 resource) for the result — high transparency, interpretability, and robust provenance.

Core Processes

Each line runs the same core processes over its own store:

Memorization — Runs the line's ingest pipeline: preprocess (raw L0 → L1 document) → slice/extract (L1 → L2 items) → embed. Fully autonomous, no manual labeling.
Retrieval — A single hybrid pass over the L2 items: cosine embedding similarity and BM25 keyword scores are normalized and fused into one rank; top hits roll up to their L1 document.
Independent Triggers — Each line watches its own source manifest. A change under one line's source rebuilds only that line.

MemU vs. Traditional Memory Systems

Different memory approaches serve different purposes. Here's how MemU compares:

Feature	Traditional Systems	MemU
Structure	One store for everything	Three independent lines, each with its own store and markdown output
Memory Formation	Explicit modeling (manual)	Autonomous per-line ingest pipelines
Retrieval	Embedding search only	Hybrid: embedding + BM25 keyword, fused in one pass
Multimodal	Limited support	Full multimodal (text, image, audio, video)
Traceability	No	Full item → document → source roll-up
Change Handling	Global rebuilds	Per-line manifest diff — only the affected line rebuilds
Skill Memory	No	Dedicated skill line with its own provenance and deletion

Hybrid Retrieval (Embedding + BM25)

All three lines retrieve the same way — one shared mechanism, no per-line forks:

Embedding score — Cosine similarity between the query vector and each L2 item's embedding captures semantic matches
BM25 score — Keyword ranking over the same items captures exact-term matches (identifiers, names, error codes)
Fusion — Both scores are min-max normalized across candidates and fused into a single rank; top items roll up to their L1 document

This is a standard, cheap, single-pass design: no graph traversal, no multi-hop loops, no per-query LLM ranking cost. It covers both semantic and exact-term queries while staying fast and deterministic.