Skip to content

Getting Started

mcp-memory is a drop-in replacement for Anthropic’s MCP Memory server. It provides a persistent knowledge graph where AI agents store entities, observations, and relationships — and retrieve them across sessions.

It keeps full API compatibility with Anthropic’s 8 tools while adding semantic search, hybrid retrieval, and a dynamic scoring engine. All data is stored in SQLite with WAL mode for safe concurrent access. See the Architecture page for a deep dive into how it works.

The official Anthropic server stores the entire knowledge graph in a single JSONL file. This works for demos, but breaks under real usage:

DimensionJSONL (Anthropic)mcp-memory
IndexingNone — full file scan on every querySQLite indexes on name, type, and content
Semantic searchNot availableKNN with ONNX embeddings (94+ languages)
Hybrid searchNot availableKNN + FTS5 via RRF
Query routingNot availableDynamic 3-strategy routing (COSINE_HEAVY/LIMBIC_HEAVY/HYBRID_BALANCED)
Limbic scoringNot availableSalience + temporal decay + co-occurrence with temporal decay
Entity splittingNot availableAutomatic TF-IDF based splitting with approval workflow
A/B testingNot availableShadow mode with NDCG@K metrics
Auto-tuningNot availableGrid search for GAMMA/BETA_SAL optimization
ConcurrencyRace conditions confirmedSQLite WAL with 5-second busy timeout
ScaleDegrades linearly with file sizeO(log n) indexed queries
Data corruptionDocumented in issues #1819, #2579 (May 2025, still open)ACID transactions with auto-rollback

The official server rewrites the entire file on every operation. Without locking or atomic writes, concurrent operations produce JSON merging and duplicate lines. mcp-memory solves these problems at the root with a storage engine designed for persistent data.

  • Python >= 3.12
  • uv (recommended) or pip for dependency management
  • Git for cloning the repository
  • ~465 MB disk space if you download the embedding model (optional)
  • ~50 MB for test suite (313 tests passing)
Terminal window
git clone https://github.com/Yarlan1503/mcp-memory.git
cd mcp-memory
Terminal window
uv sync

uv sync creates a virtual environment, resolves all dependencies from pyproject.toml, and generates the mcp-memory entry point.

3. Download the embedding model (optional)

Section titled “3. Download the embedding model (optional)”
Terminal window
uv run python scripts/download_model.py

This downloads the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 sentence model (~465 MB) to ~/.cache/mcp-memory-v2/models/:

FilePurpose
model.onnxONNX-exported model for CPU inference
tokenizer.jsonHuggingFace fast tokenizer (Rust)
tokenizer_config.jsonTokenizer configuration
special_tokens_map.jsonSpecial token mappings
Terminal window
uv run mcp-memory

The server starts as a stdio process. It registers as "memory" in the MCP protocol, listens for JSON-RPC on stdin, and writes logs to stderr (no interference with MCP communication).

Add to the mcp section of your opencode.json:

{
"mcp": {
"memory": {
"command": "uv",
"args": ["--directory", "/path/to/mcp-memory", "run", "mcp-memory"]
}
}
}

Replace /path/to/mcp-memory with the absolute path to the cloned repository.

Add to your Claude Desktop config file:

{
"mcpServers": {
"memory": {
"command": "uv",
"args": ["run", "mcp-memory"],
"cwd": "/path/to/mcp-memory"
}
}
}

Replace /path/to/mcp-memory with the absolute path to the cloned repository.

If you prefer not to clone the repo, run directly from GitHub:

{
"mcpServers": {
"memory": {
"command": "uvx",
"args": ["--from", "git+https://github.com/Yarlan1503/mcp-memory", "mcp-memory"]
}
}
}

Store knowledge as entities with a name, type, and observations:

{
"entities": [
{
"name": "My Project",
"entityType": "Project",
"observations": [
"Built with Astro and Starlight",
"Deployed on Vercel",
"Uses Pagefind for search"
]
}
]
}

If an entity already exists, create_entities merges observations instead of overwriting. Duplicates are discarded.

Connect entities with typed relationships:

{
"relations": [
{
"from": "My Project",
"to": "Astro",
"relationType": "uses"
},
{
"from": "My Project",
"to": "Vercel",
"relationType": "deployed_on"
}
]
}

Both entities must exist before creating a relation between them.

Find entities by keyword across names, types, and observation content:

{
"query": "project"
}

search_nodes uses LIKE pattern matching. It requires no embedding model and returns all entities whose name, type, or observations contain the query string.

Find entities that are semantically related to your query, even without matching keywords:

{
"query": "web framework deployment",
"limit": 5
}

search_semantic encodes the query into a 384-dimensional vector and finds the nearest neighbors by cosine similarity. Results are re-ranked by the Limbic Scoring engine, which considers access frequency, recency, and co-occurrence patterns.

Entities with many observations can be automatically split into focused sub-entities:

{
"entity_name": "My Project"
}

analyze_entity_split evaluates if an entity exceeds its type threshold (Sesion=15, Proyecto=25, otras=20) and uses TF-IDF to group observations into topics. If splitting is recommended, propose_entity_split returns suggested new entity names and the relations to create.

{
"entity_name": "My Project",
"approved_splits": [
{
"name": "My Project - Architecture",
"entity_type": "Project",
"observations": ["Stack: FastMCP + SQLite", "MCP Memory v2"]
}
]
}

execute_entity_split creates the new entities, moves observations, and establishes contiene/parte_de relations — all within an atomic transaction.

The server works without the embedding model downloaded. Here’s what changes:

FeatureWithout modelWith model
create_entities✅ Works✅ Works + generates embedding
create_relations✅ Works✅ Works
add_observations✅ Works✅ Works + regenerates embedding
delete_entities✅ Works✅ Works + removes embedding
delete_observations✅ Works✅ Works + regenerates embedding
delete_relations✅ Works✅ Works
search_nodes✅ Works✅ Works
open_nodes✅ Works✅ Works
migrate✅ Works✅ Works + generates embeddings
search_semantic❌ Error✅ Works
find_duplicate_observations❌ Error✅ Works
consolidation_report✅ Works✅ Works
end_relation✅ Works✅ Works
add_reflection✅ Works✅ Works + generates embedding
search_reflections❌ Error✅ Works

When the model is not available, search_semantic returns a clear error message instructing you to run the download script. All other tools function normally.

  • Architecture — understand the storage engine, embedding pipeline, and data flow
  • Tools Reference — parameters, responses, and edge cases for all 19 tools
  • Semantic Search — how vector search, hybrid retrieval, and Limbic Scoring work together
  • Maintenance & Operations — deduplication, entity splitting, consolidation reports, and best practices
  • Auto-tuning — optimize GAMMA and BETA_SAL via grid search