What is mcp-memory?

mcp-memory is a drop-in replacement for Anthropic’s MCP Memory server. It provides a persistent knowledge graph where AI agents store entities, observations, and relationships — and retrieve them across sessions.

It keeps full API compatibility with Anthropic’s 8 tools while adding semantic search, hybrid retrieval, and a dynamic scoring engine. All data is stored in SQLite with WAL mode for safe concurrent access. See the Architecture page for a deep dive into how it works.

Why it exists

The official Anthropic server stores the entire knowledge graph in a single JSONL file. This works for demos, but breaks under real usage:

DimensionJSONL (Anthropic)mcp-memory
IndexingNone — full file scan on every querySQLite indexes on name, type, and content
Semantic searchNot availableKNN with ONNX embeddings (94+ languages)
Hybrid searchNot availableKNN + FTS5 via RRF
Query routingNot availableDynamic 3-strategy routing (COSINE_HEAVY/LIMBIC_HEAVY/HYBRID_BALANCED)
Limbic scoringNot availableSalience + temporal decay + co-occurrence with temporal decay
Entity splittingNot availableAutomatic semantic clustering based splitting with approval workflow
A/B testingNot availableShadow mode with NDCG@K metrics
Auto-tuningNot availableGrid search for GAMMA/BETA_SAL optimization
ConcurrencyRace conditions confirmedSQLite WAL with 5-second busy timeout
ScaleDegrades linearly with file sizeO(log n) indexed queries
Data corruptionDocumented in issues #1819, #2579 (May 2025, still open)ACID transactions with auto-rollback

The official server rewrites the entire file on every operation. Without locking or atomic writes, concurrent operations produce JSON merging and duplicate lines. mcp-memory solves these problems at the root with a storage engine designed for persistent data.

Requirements

  • Python >= 3.12
  • uv (recommended) or pip for dependency management
  • Git for cloning the repository
  • ~465 MB disk space if you download the embedding model (optional)
  • ~50 MB for test suite (402 tests passing)

Installation

1. Clone the repository

git clone https://github.com/Yarlan1503/mcp-memory.git
cd mcp-memory

2. Install dependencies

uv sync

uv sync creates a virtual environment, resolves all dependencies from pyproject.toml, and generates the mcp-memory entry point.

3. Download the embedding model

uv run python scripts/download_model.py

The model is downloaded automatically on first use. The script below is provided for manual/offline setups.

This downloads the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 sentence model (~465 MB) to ~/.cache/mcp-memory-v2/models/:

FilePurpose
model.onnxONNX-exported model for CPU inference
tokenizer.jsonHuggingFace fast tokenizer (Rust)
tokenizer_config.jsonTokenizer configuration
special_tokens_map.jsonSpecial token mappings

:::tip The model download is optional. The server starts and runs all 19 tools without it. Only search_semantic, find_duplicate_observations, and search_reflections require the model. See Without the model below. :::

4. Verify the installation

uv run mcp-memory

The server starts as a stdio process. It registers as "memory" in the MCP protocol, listens for JSON-RPC on stdin, and writes logs to stderr (no interference with MCP communication).

:::note You won’t see output on stdout — that’s correct. The server communicates via the MCP protocol (JSON-RPC over stdio). Logs go to stderr. :::

Configuration

OpenCode

Add to the mcp section of your opencode.json:

{
  "mcp": {
    "memory": {
      "command": "uv",
      "args": ["--directory", "/path/to/mcp-memory", "run", "mcp-memory"]
    }
  }
}

Replace /path/to/mcp-memory with the absolute path to the cloned repository.

Claude Desktop

Add to your Claude Desktop config file:

{
  "mcpServers": {
    "memory": {
      "command": "uv",
      "args": ["run", "mcp-memory"],
      "cwd": "/path/to/mcp-memory"
    }
  }
}

Replace /path/to/mcp-memory with the absolute path to the cloned repository.

uvx (no clone required)

If you prefer not to clone the repo, run directly from GitHub:

{
  "mcpServers": {
    "memory": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/Yarlan1503/mcp-memory", "mcp-memory"]
    }
  }
}

:::caution The uvx method does not support downloading the embedding model. If you need semantic search, clone the repository instead and follow the installation steps above. :::

First steps

Create entities

Store knowledge as entities with a name, type, and observations:

{
  "entities": [
    {
      "name": "My Project",
      "entityType": "Project",
      "observations": [
        "Built with Astro and Starlight",
        "Deployed on Vercel",
        "Uses Pagefind for search"
      ]
    }
  ]
}

If an entity already exists, create_entities merges observations instead of overwriting. Duplicates are discarded.

Connect entities with typed relationships:

{
  "relations": [
    {
      "from": "My Project",
      "to": "Astro",
      "relationType": "uses"
    },
    {
      "from": "My Project",
      "to": "Vercel",
      "relationType": "deployed_on"
    }
  ]
}

Both entities must exist before creating a relation between them.

Search by substring

Find entities by keyword across names, types, and observation content:

{
  "query": "project"
}

search_nodes uses LIKE pattern matching. It requires no embedding model and returns all entities whose name, type, or observations contain the query string.

Search by meaning

Find entities that are semantically related to your query, even without matching keywords:

{
  "query": "web framework deployment",
  "limit": 5
}

search_semantic encodes the query into a 384-dimensional vector and finds the nearest neighbors by cosine similarity. Results are re-ranked by the Limbic Scoring engine, which considers access frequency, recency, and co-occurrence patterns.

Split large entities automatically

Entities with many observations can be automatically split into focused sub-entities:

{
  "entity_name": "My Project"
}

analyze_entity_split evaluates if an entity exceeds its type threshold (Sesion=15, Proyecto=25, otras=20) and uses semantic clustering (Agglomerative + c-TF-IDF fallback) to group observations into topics. If splitting is recommended, propose_entity_split returns suggested new entity names and the relations to create.

{
  "entity_name": "My Project",
  "approved_splits": [
    {
      "name": "My Project - Architecture",
      "entity_type": "Project",
      "observations": ["Stack: FastMCP + SQLite", "MCP Memory v2"]
    }
  ]
}

execute_entity_split creates the new entities, moves observations, and establishes contiene/parte_de relations — all within an atomic transaction.

:::tip For the full entity splitting workflow and semantic clustering topic extraction details, see the Tools Reference page. :::

Without the model

The server works without the embedding model downloaded. Here’s what changes:

FeatureWithout modelWith model
create_entities✅ Works✅ Works + generates embedding
create_relations✅ Works✅ Works
add_observations✅ Works✅ Works + regenerates embedding
delete_entities✅ Works✅ Works + removes embedding
delete_observations✅ Works✅ Works + regenerates embedding
delete_relations✅ Works✅ Works
search_nodes✅ Works✅ Works
open_nodes✅ Works✅ Works
migrate✅ Works✅ Works + generates embeddings
search_semantic❌ Error✅ Works
find_duplicate_observations❌ Error✅ Works
consolidation_report✅ Works✅ Works
end_relation✅ Works✅ Works
add_reflection✅ Works✅ Works + generates embedding
search_reflections❌ Error✅ Works

When the model is not available, search_semantic returns a clear error message instructing you to run the download script. All other tools function normally.

Next steps

  • Architecture — understand the storage engine, embedding pipeline, and data flow
  • Tools Reference — parameters, responses, and edge cases for all 19 tools
  • Semantic Search — how vector search, hybrid retrieval, and Limbic Scoring work together
  • Maintenance & Operations — deduplication, entity splitting, consolidation reports, and best practices
  • Auto-tuning — optimize GAMMA and BETA_SAL via grid search