# SimpleMem **Repository Path**: jerryzu/SimpleMem ## Basic Information - **Project Name**: SimpleMem - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-25 - **Last Updated**: 2026-02-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
simplemem_logo
## Efficient Lifelong Memory for LLM Agents Store, compress, and retrieve long-term memories with semantic lossless compression. Works across Claude, Cursor, LM Studio, and more.

Works with any AI platform that supports MCP or Python integration

Claude Desktop
Claude Desktop
Cursor
Cursor
LM Studio
LM Studio
Cherry Studio
Cherry Studio
PyPI
PyPI Package
+ Any MCP
Client

[๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡](./docs/i18n/README.zh-CN.md) โ€ข [๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชž](./docs/i18n/README.ja.md) โ€ข [๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด](./docs/i18n/README.ko.md) โ€ข [๐Ÿ‡ช๐Ÿ‡ธ Espaรฑol](./docs/i18n/README.es.md) โ€ข [๐Ÿ‡ซ๐Ÿ‡ท Franรงais](./docs/i18n/README.fr.md) โ€ข [๐Ÿ‡ฉ๐Ÿ‡ช Deutsch](./docs/i18n/README.de.md) โ€ข [๐Ÿ‡ง๐Ÿ‡ท Portuguรชs](./docs/i18n/README.pt-br.md)
[๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน](./docs/i18n/README.ru.md) โ€ข [๐Ÿ‡ธ๐Ÿ‡ฆ ุงู„ุนุฑุจูŠุฉ](./docs/i18n/README.ar.md) โ€ข [๐Ÿ‡ฎ๐Ÿ‡น Italiano](./docs/i18n/README.it.md) โ€ข [๐Ÿ‡ป๐Ÿ‡ณ Tiแบฟng Viแป‡t](./docs/i18n/README.vi.md) โ€ข [๐Ÿ‡น๐Ÿ‡ท Tรผrkรงe](./docs/i18n/README.tr.md)
[![Project Page](https://img.shields.io/badge/๐ŸŽฌ_INTERACTIVE_DEMO-Visit_Our_Website-FF6B6B?style=for-the-badge&labelColor=FF6B6B&color=4ECDC4&logoColor=white)](https://aiming-lab.github.io/SimpleMem-Page)

arXiv GitHub License PRs Welcome
PyPI Python MCP Server Claude Skills
Discord WeChat


[Overview](#-overview) โ€ข [Quick Start](#-quick-start) โ€ข [Cross-Session Memory](#-cross-session-memory) โ€ข [MCP Server](#-mcp-server) โ€ข [Evaluation](#-evaluation) โ€ข [Citation](#-citation)

## ๐Ÿ”ฅ News - **[02/09/2026]** ๐Ÿš€ **Cross-Session Memory is Here โ€” Outperforming Claude-Mem by 64%!** SimpleMem now supports **persistent memory across conversations**. On the LoCoMo benchmark, SimpleMem achieves a **64% performance boost** over Claude-Mem. Your agents can now recall context, decisions, and learnings from previous sessions automatically. [View Cross-Session Documentation โ†’](cross/README.md) - **[01/20/2026]** **SimpleMem is now available on PyPI!** ๐Ÿ“ฆ Install directly via `pip install simplemem`. [View Package Usage Guide โ†’](docs/PACKAGE_USAGE.md) - **[01/19/2026]** **Added Local Memory Storage for SimpleMem Skill!** ๐Ÿ’พ SimpleMem Skill now supports local memory storage within Claude Skills. - **[01/18/2026]** **SimpleMem now supports Claude Skills!** ๐Ÿš€ Use SimpleMem in claude.ai for long-term memory across conversations. Register at [mcp.simplemem.cloud](https://mcp.simplemem.cloud), configure your token, and import the skill! - **[01/14/2026]** **SimpleMem MCP Server is now LIVE and Open Source!** ๐ŸŽ‰ Cloud-hosted memory service at [mcp.simplemem.cloud](https://mcp.simplemem.cloud). Integrates with LM Studio, Cherry Studio, Cursor, Claude Desktop via **Streamable HTTP** MCP protocol. [View MCP Documentation โ†’](MCP/README.md) - **[01/08/2026]** ๐Ÿ”ฅ Join our [Discord](https://discord.gg/KA2zC32M) and [WeChat Group](fig/wechat_logo3.JPG) to collaborate and exchange ideas! - **[01/05/2026]** SimpleMem paper was released on [arXiv](https://arxiv.org/abs/2601.02553)! --- ## ๐Ÿ“‘ Table of Contents - [๐ŸŒŸ Overview](#-overview) - [๐ŸŽฏ Key Contributions](#-key-contributions) - [๐Ÿš€ Performance Highlights](#-performance-highlights) - [๐Ÿ“ฆ Installation](#-installation) - [โšก Quick Start](#-quick-start) - [๐Ÿง  Cross-Session Memory](#-cross-session-memory) - [๐Ÿ”Œ MCP Server](#-mcp-server) - [๐Ÿ“Š Evaluation](#-evaluation) - [๐Ÿ“ Citation](#-citation) - [๐Ÿ“„ License](#-license) - [๐Ÿ™ Acknowledgments](#-acknowledgments) --- ## ๐ŸŒŸ Overview
Performance vs Efficiency Trade-off *SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.*
**SimpleMem** is an efficient memory framework based on **semantic lossless compression** that addresses the fundamental challenge of **efficient long-term memory for LLM agents**. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes **information density** and **token utilization** through a three-stage pipeline:
### ๐Ÿ” Stage 1 **Semantic Structured Compression** Distills unstructured interactions into compact, multi-view indexed memory units ### ๐Ÿ—‚๏ธ Stage 2 **Online Semantic Synthesis** Intra-session process that instantly integrates related context into unified abstract representations to eliminate redundancy ### ๐ŸŽฏ Stage 3 **Intent-Aware Retrieval Planning** Infers search intent to dynamically determine retrieval scope and construct precise context efficiently
SimpleMem Framework *The SimpleMem Architecture: (1) Semantic Structured Compression filters low-utility dialogue and converts informative windows into compact, context-independent memory units. (2) Online Semantic Synthesis consolidates related fragments during writing, maintaining a compact and coherent memory topology. (3) Intent-Aware Retrieval Planning infers search intent to adapt retrieval scope and query forms, enabling parallel multi-view retrieval and token-efficient context construction.*
--- ### ๐Ÿ† Performance Comparison
**Speed Comparison Demo** *SimpleMem vs. Baseline: Real-time speed comparison demonstration*
**LoCoMo-10 Benchmark Results (GPT-4.1-mini)** | Model | โฑ๏ธ Construction Time | ๐Ÿ”Ž Retrieval Time | โšก Total Time | ๐ŸŽฏ Average F1 | |:------|:--------------------:|:-----------------:|:-------------:|:-------------:| | A-Mem | 5140.5s | 796.7s | 5937.2s | 32.58% | | LightMem | 97.8s | 577.1s | 675.9s | 24.63% | | Mem0 | 1350.9s | 583.4s | 1934.3s | 34.20% | | **SimpleMem** โญ | **92.6s** | **388.3s** | **480.9s** | **43.24%** |
> **๐Ÿ’ก Key Advantages:** > - ๐Ÿ† **Highest F1 Score**: 43.24% (+26.4% vs. Mem0, +75.6% vs. LightMem) > - โšก **Fastest Retrieval**: 388.3s (32.7% faster than LightMem, 51.3% faster than Mem0) > - ๐Ÿš€ **Fastest End-to-End**: 480.9s total processing time (12.5ร— faster than A-Mem) --- ## ๐ŸŽฏ Key Contributions ### 1๏ธโƒฃ Semantic Structured Compression SimpleMem applies an **implicit semantic density gating** mechanism integrated into the LLM generation process to filter redundant interaction content. The system reformulates raw dialogue streams into **compact memory units** โ€” self-contained facts with resolved coreferences and absolute timestamps. Each unit is indexed through three complementary representations for flexible retrieval:
| ๐Ÿ” Layer | ๐Ÿ“Š Type | ๐ŸŽฏ Purpose | ๐Ÿ› ๏ธ Implementation | |---------|---------|------------|-------------------| | **Semantic** | Dense | Conceptual similarity | Vector embeddings (1024-d) | | **Lexical** | Sparse | Exact term matching | BM25-style keyword index | | **Symbolic** | Metadata | Structured filtering | Timestamps, entities, persons |
**โœจ Example Transformation:** ```diff - Input: "He'll meet Bob tomorrow at 2pm" [โŒ relative, ambiguous] + Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00" [โœ… absolute, atomic] ``` --- ### 2๏ธโƒฃ Online Semantic Synthesis Unlike traditional systems that rely on asynchronous background maintenance, SimpleMem performs synthesis **on-the-fly during the write phase**. Related memory units are synthesized into higher-level abstract representations within the current session scope, allowing repetitive or structurally similar experiences to be **denoised and compressed immediately**. **โœจ Example Synthesis:** ```diff - Fragment 1: "User wants coffee" - Fragment 2: "User prefers oat milk" - Fragment 3: "User likes it hot" + Consolidated: "User prefers hot coffee with oat milk" ``` This proactive synthesis ensures the memory topology remains compact and free of redundant fragmentation. --- ### 3๏ธโƒฃ Intent-Aware Retrieval Planning Instead of fixed-depth retrieval, SimpleMem leverages the reasoning capabilities of the LLM to generate a **comprehensive retrieval plan**. Given a query, the planning module infers **latent search intent** to dynamically determine retrieval scope and depth: $$\{ q_{\text{sem}}, q_{\text{lex}}, q_{\text{sym}}, d \} \sim \mathcal{P}(q, H)$$ The system then executes **parallel multi-view retrieval** across semantic, lexical, and symbolic indexes, and merges results through ID-based deduplication:
**๐Ÿ”น Simple Queries** - Direct fact lookup via single memory unit - Minimal retrieval depth - Fast response time **๐Ÿ”ธ Complex Queries** - Aggregation across multiple events - Expanded retrieval depth - Comprehensive coverage
**๐Ÿ“ˆ Result**: 43.24% F1 score with **30ร— fewer tokens** than full-context methods. --- ## ๐Ÿš€ Performance Highlights ### ๐Ÿ“Š Benchmark Results (LoCoMo)
๐Ÿ† Cross-Session Memory Comparison | System | LoCoMo Score | vs SimpleMem | |:-------|:------------:|:------------:| | **SimpleMem** | **48** | โ€” | | Claude-Mem | 29.3 | **+64%** |
๐Ÿ”ฌ High-Capability Models (GPT-4.1-mini) | Task Type | SimpleMem F1 | Mem0 F1 | Improvement | |:----------|:------------:|:-------:|:-----------:| | **MultiHop** | 43.46% | 30.14% | **+43.8%** | | **Temporal** | 58.62% | 48.91% | **+19.9%** | | **SingleHop** | 51.12% | 41.3% | **+23.8%** |
โš™๏ธ Efficient Models (Qwen2.5-1.5B) | Metric | SimpleMem | Mem0 | Notes | |:-------|:---------:|:----:|:------| | **Average F1** | 25.23% | 23.77% | Competitive with 99ร— smaller model |
--- ## ๐Ÿ“ฆ Installation ### ๐Ÿ“ Notes for First-Time Users - Ensure you are using **Python 3.10 in your active environment**, not just installed globally. - An OpenAI-compatible API key must be configured **before running any memory construction or retrieval**, otherwise initialization may fail. - When using non-OpenAI providers (e.g., Qwen or Azure OpenAI), verify both the model name and `OPENAI_BASE_URL` in `config.py`. - For large dialogue datasets, enabling parallel processing can significantly reduce memory construction time. ### ๐Ÿ“‹ Requirements - ๐Ÿ Python 3.10 - ๐Ÿ”‘ OpenAI-compatible API (OpenAI, Qwen, Azure OpenAI, etc.) ### ๐Ÿ› ๏ธ Setup ```bash # ๐Ÿ“ฅ Clone repository git clone https://github.com/aiming-lab/SimpleMem.git cd SimpleMem # ๐Ÿ“ฆ Install dependencies pip install -r requirements.txt # โš™๏ธ Configure API settings cp config.py.example config.py # Edit config.py with your API key and preferences ``` ### โš™๏ธ Configuration Example ```python # config.py OPENAI_API_KEY = "your-api-key" OPENAI_BASE_URL = None # or custom endpoint for Qwen/Azure LLM_MODEL = "gpt-4.1-mini" EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B" # State-of-the-art retrieval ``` --- ## โšก Quick Start ### ๐Ÿง  Understanding the Basic Workflow At a high level, SimpleMem works as a long-term memory system for LLM-based agents. The workflow consists of three simple steps: 1. **Store information** โ€“ Dialogues or facts are processed and converted into structured, atomic memories. 2. **Index memory** โ€“ Stored memories are organized using semantic embeddings and structured metadata. 3. **Retrieve relevant memory** โ€“ When a query is made, SimpleMem retrieves the most relevant stored information based on meaning rather than keywords. This design allows LLM agents to maintain context, recall past information efficiently, and avoid repeatedly processing redundant history. ### ๐ŸŽ“ Basic Usage ```python from main import SimpleMemSystem # ๐Ÿš€ Initialize system system = SimpleMemSystem(clear_db=True) # ๐Ÿ’ฌ Add dialogues (Stage 1: Semantic Structured Compression) system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00") system.add_dialogue("Bob", "Sure, I'll bring the market analysis report", "2025-11-15T14:31:00") # โœ… Finalize atomic encoding system.finalize() # ๐Ÿ”Ž Query with intent-aware retrieval (Stage 3: Intent-Aware Retrieval Planning) answer = system.ask("When and where will Alice and Bob meet?") print(answer) # Output: "16 November 2025 at 2:00 PM at Starbucks" ``` --- ### ๐Ÿš„ Advanced: Parallel Processing For large-scale dialogue processing, enable parallel mode: ```python system = SimpleMemSystem( clear_db=True, enable_parallel_processing=True, # โšก Parallel memory building max_parallel_workers=8, enable_parallel_retrieval=True, # ๐Ÿ” Parallel query execution max_retrieval_workers=4 ) ``` > **๐Ÿ’ก Pro Tip**: Parallel processing significantly reduces latency for batch operations! --- ## โ“ Common Setup Issues & Troubleshooting If you encounter issues while setting up or running SimpleMem for the first time, check the following common cases: ### 1๏ธโƒฃ API Key Not Detected - Ensure your API key is correctly set in `config.py` - For OpenAI-compatible providers (Qwen, Azure, etc.), verify that `OPENAI_BASE_URL` is configured correctly - Restart your Python environment after updating the key ### 2๏ธโƒฃ Python Version Mismatch - SimpleMem requires **Python 3.10** - Check your version using: ```bash python --version ``` --- ## ๐Ÿง  Cross-Session Memory **SimpleMem-Cross** extends SimpleMem with persistent cross-conversation memory capabilities. Agents can recall context, decisions, and observations from previous sessions โ€” enabling continuity across conversations without manual context re-injection. ### Key Features | Feature | Description | |---------|-------------| | **Session Lifecycle** | Full session management with start/record/stop/end lifecycle | | **Automatic Context Injection** | Token-budgeted context from previous sessions injected at session start | | **Event Collection** | Record messages, tool uses, file changes with automatic redaction | | **Observation Extraction** | Heuristic extraction of decisions, discoveries, and learnings | | **Provenance Tracking** | Every memory entry links back to source evidence | | **Consolidation** | Decay, merge, and prune old memories to maintain quality | ### Quick Example ```python from cross.orchestrator import create_orchestrator async def main(): orch = create_orchestrator(project="my-project") # Start session โ€” previous context is injected automatically result = await orch.start_session( content_session_id="session-001", user_prompt="Continue building the REST API", ) print(result["context"]) # Relevant context from previous sessions # Record events during the session await orch.record_message(result["memory_session_id"], "User asked about JWT") await orch.record_tool_use( result["memory_session_id"], tool_name="read_file", tool_input="auth/jwt.py", tool_output="class JWTHandler: ...", ) # Finalize โ€” extracts observations, generates summary, stores memories report = await orch.stop_session(result["memory_session_id"]) print(f"Stored {report.entries_stored} memory entries") await orch.end_session(result["memory_session_id"]) orch.close() ``` ### Architecture ``` Agent Frameworks (Claude Code / Cursor / custom) | +--------------+--------------+ | | Hook/Lifecycle Adapter HTTP/MCP API (FastAPI) | | +--------------+--------------+ | CrossMemOrchestrator | +-----------------+------------------+ | | | Session Manager Context Injector Consolidation (SQLite) (budgeted bundle) (decay/merge/prune) | | | +---------+-------+ | | | Cross-Session Vector Store (LanceDB) <--+ ``` ### Module Reference | Module | Description | |--------|-------------| | `cross/types.py` | Pydantic models, enums, records | | `cross/storage_sqlite.py` | SQLite backend for sessions, events, observations | | `cross/storage_lancedb.py` | LanceDB vector store with provenance | | `cross/hooks.py` | Lifecycle hooks (SessionStart/ToolUse/End) | | `cross/collectors.py` | Event collection with 3-tier redaction | | `cross/session_manager.py` | Full session lifecycle orchestration | | `cross/context_injector.py` | Token-budgeted context builder | | `cross/orchestrator.py` | Top-level facade and factory | | `cross/api_http.py` | FastAPI REST endpoints | | `cross/api_mcp.py` | MCP tool definitions | | `cross/consolidation.py` | Memory maintenance worker | > ๐Ÿ“– For detailed API documentation, see [Cross-Session README](cross/README.md) --- ## ๐Ÿ”Œ MCP Server SimpleMem is available as a **cloud-hosted memory service** via the Model Context Protocol (MCP), enabling seamless integration with AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients. **๐ŸŒ Cloud Service**: [mcp.simplemem.cloud](https://mcp.simplemem.cloud) ### Key Features | Feature | Description | |---------|-------------| | **Streamable HTTP** | MCP 2025-03-26 protocol with JSON-RPC 2.0 | | **Multi-tenant Isolation** | Per-user data tables with token authentication | | **Hybrid Retrieval** | Semantic search + keyword matching + metadata filtering | | **Production Optimized** | Faster response times with OpenRouter integration | ### Quick Configuration ```json { "mcpServers": { "simplemem": { "url": "https://mcp.simplemem.cloud/mcp", "headers": { "Authorization": "Bearer YOUR_TOKEN" } } } } ``` > ๐Ÿ“– For detailed setup instructions and self-hosting guide, see [MCP Documentation](MCP/README.md) --- ## ๐Ÿ“Š Evaluation ### ๐Ÿงช Run Benchmark Tests ```bash # ๐ŸŽฏ Full LoCoMo benchmark python test_locomo10.py # ๐Ÿ“‰ Subset evaluation (5 samples) python test_locomo10.py --num-samples 5 # ๐Ÿ’พ Custom output file python test_locomo10.py --result-file my_results.json ``` --- ### ๐Ÿ”ฌ Reproduce Paper Results Use the exact configurations in `config.py`: - **๐Ÿš€ High-capability**: GPT-4.1-mini, Qwen3-Plus - **โš™๏ธ Efficient**: Qwen2.5-1.5B, Qwen2.5-3B - **๐Ÿ” Embedding**: Qwen3-Embedding-0.6B (1024-d) --- ## ๐Ÿ“ Citation If you use SimpleMem in your research, please cite: ```bibtex @article{simplemem2025, title={SimpleMem: Efficient Lifelong Memory for LLM Agents}, author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu}, journal={arXiv preprint arXiv:2601.02553}, year={2025}, url={https://github.com/aiming-lab/SimpleMem} } ``` --- ## ๐Ÿ“„ License This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details. --- ## ๐Ÿ™ Acknowledgments We would like to thank the following projects and teams: - ๐Ÿ” **Embedding Model**: [Qwen3-Embedding](https://github.com/QwenLM/Qwen) - State-of-the-art retrieval performance - ๐Ÿ—„๏ธ **Vector Database**: [LanceDB](https://lancedb.com/) - High-performance columnar storage - ๐Ÿ“Š **Benchmark**: [LoCoMo](https://github.com/snap-research/locomo) - Long-context memory evaluation framework