Getting Started — CachorroSpace

What is mcp-memory?

mcp-memory is a drop-in replacement for Anthropic’s MCP Memory server. It provides a persistent knowledge graph where AI agents store entities, observations, and relationships — and retrieve them across sessions.

It keeps full API compatibility with Anthropic’s 8 tools while adding semantic search, hybrid retrieval, and a dynamic scoring engine. All data is stored in SQLite with WAL mode for safe concurrent access. See the Architecture page for a deep dive into how it works.

Why it exists

The official Anthropic server stores the entire knowledge graph in a single JSONL file. This works for demos, but breaks under real usage:

Dimension	JSONL (Anthropic)	mcp-memory
Indexing	None — full file scan on every query	SQLite indexes on name, type, and content
Semantic search	Not available	KNN with ONNX embeddings (94+ languages)
Hybrid search	Not available	KNN + FTS5 via RRF
Query routing	Not available	Dynamic 3-strategy routing (COSINE_HEAVY/LIMBIC_HEAVY/HYBRID_BALANCED)
Limbic scoring	Not available	Salience + temporal decay + co-occurrence with temporal decay
Entity splitting	Not available	Automatic semantic clustering based splitting with approval workflow
A/B testing	Not available	Shadow mode with NDCG@K metrics
Auto-tuning	Not available	Grid search for GAMMA/BETA_SAL optimization
Concurrency	Race conditions confirmed	SQLite WAL with 5-second busy timeout
Scale	Degrades linearly with file size	O(log n) indexed queries
Data corruption	Documented in issues #1819, #2579 (May 2025, still open)	ACID transactions with auto-rollback

The official server rewrites the entire file on every operation. Without locking or atomic writes, concurrent operations produce JSON merging and duplicate lines. mcp-memory solves these problems at the root with a storage engine designed for persistent data.

Requirements

Python >= 3.12
uv (recommended) or pip for dependency management
Git for cloning the repository
~465 MB disk space if you download the embedding model (optional)
~50 MB for test suite (402 tests passing)

Installation

1. Clone the repository

git clone https://github.com/Yarlan1503/mcp-memory.git
cd mcp-memory

2. Install dependencies

uv sync

uv sync creates a virtual environment, resolves all dependencies from pyproject.toml, and generates the mcp-memory entry point.

3. Download the embedding model

uv run python scripts/download_model.py

The model is downloaded automatically on first use. The script below is provided for manual/offline setups.

This downloads the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 sentence model (~465 MB) to ~/.cache/mcp-memory-v2/models/:

File	Purpose
`model.onnx`	ONNX-exported model for CPU inference
`tokenizer.json`	HuggingFace fast tokenizer (Rust)
`tokenizer_config.json`	Tokenizer configuration
`special_tokens_map.json`	Special token mappings

:::tip The model download is optional. The server starts and runs all 19 tools without it. Only search_semantic, find_duplicate_observations, and search_reflections require the model. See Without the model below. :::

4. Verify the installation

uv run mcp-memory

The server starts as a stdio process. It registers as "memory" in the MCP protocol, listens for JSON-RPC on stdin, and writes logs to stderr (no interference with MCP communication).

:::note You won’t see output on stdout — that’s correct. The server communicates via the MCP protocol (JSON-RPC over stdio). Logs go to stderr. :::

Configuration

OpenCode

Add to the mcp section of your opencode.json:

{
  "mcp": {
    "memory": {
      "command": "uv",
      "args": ["--directory", "/path/to/mcp-memory", "run", "mcp-memory"]
    }
  }
}

Replace /path/to/mcp-memory with the absolute path to the cloned repository.

Claude Desktop

Add to your Claude Desktop config file:

{
  "mcpServers": {
    "memory": {
      "command": "uv",
      "args": ["run", "mcp-memory"],
      "cwd": "/path/to/mcp-memory"
    }
  }
}

Replace /path/to/mcp-memory with the absolute path to the cloned repository.

uvx (no clone required)

If you prefer not to clone the repo, run directly from GitHub:

{
  "mcpServers": {
    "memory": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/Yarlan1503/mcp-memory", "mcp-memory"]
    }
  }
}

:::caution The uvx method does not support downloading the embedding model. If you need semantic search, clone the repository instead and follow the installation steps above. :::

First steps

Create entities

Store knowledge as entities with a name, type, and observations:

{
  "entities": [
    {
      "name": "My Project",
      "entityType": "Project",
      "observations": [
        "Built with Astro and Starlight",
        "Deployed on Vercel",
        "Uses Pagefind for search"
      ]
    }
  ]
}

If an entity already exists, create_entities merges observations instead of overwriting. Duplicates are discarded.

Link entities with relations

Connect entities with typed relationships:

{
  "relations": [
    {
      "from": "My Project",
      "to": "Astro",
      "relationType": "uses"
    },
    {
      "from": "My Project",
      "to": "Vercel",
      "relationType": "deployed_on"
    }
  ]
}

Both entities must exist before creating a relation between them.

Search by substring

Find entities by keyword across names, types, and observation content:

{
  "query": "project"
}

search_nodes uses LIKE pattern matching. It requires no embedding model and returns all entities whose name, type, or observations contain the query string.

Search by meaning

Find entities that are semantically related to your query, even without matching keywords:

{
  "query": "web framework deployment",
  "limit": 5
}

search_semantic encodes the query into a 384-dimensional vector and finds the nearest neighbors by cosine similarity. Results are re-ranked by the Limbic Scoring engine, which considers access frequency, recency, and co-occurrence patterns.

Split large entities automatically

Entities with many observations can be automatically split into focused sub-entities:

{
  "entity_name": "My Project"
}

analyze_entity_split evaluates if an entity exceeds its type threshold (Sesion=15, Proyecto=25, otras=20) and uses semantic clustering (Agglomerative + c-TF-IDF fallback) to group observations into topics. If splitting is recommended, propose_entity_split returns suggested new entity names and the relations to create.

{
  "entity_name": "My Project",
  "approved_splits": [
    {
      "name": "My Project - Architecture",
      "entity_type": "Project",
      "observations": ["Stack: FastMCP + SQLite", "MCP Memory v2"]
    }
  ]
}

execute_entity_split creates the new entities, moves observations, and establishes contiene/parte_de relations — all within an atomic transaction.

:::tip For the full entity splitting workflow and semantic clustering topic extraction details, see the Tools Reference page. :::

Without the model

The server works without the embedding model downloaded. Here’s what changes:

Feature	Without model	With model
`create_entities`	✅ Works	✅ Works + generates embedding
`create_relations`	✅ Works	✅ Works
`add_observations`	✅ Works	✅ Works + regenerates embedding
`delete_entities`	✅ Works	✅ Works + removes embedding
`delete_observations`	✅ Works	✅ Works + regenerates embedding
`delete_relations`	✅ Works	✅ Works
`search_nodes`	✅ Works	✅ Works
`open_nodes`	✅ Works	✅ Works
`migrate`	✅ Works	✅ Works + generates embeddings
`search_semantic`	❌ Error	✅ Works
`find_duplicate_observations`	❌ Error	✅ Works
`consolidation_report`	✅ Works	✅ Works
`end_relation`	✅ Works	✅ Works
`add_reflection`	✅ Works	✅ Works + generates embedding
`search_reflections`	❌ Error	✅ Works

When the model is not available, search_semantic returns a clear error message instructing you to run the download script. All other tools function normally.

Next steps

Architecture — understand the storage engine, embedding pipeline, and data flow
Tools Reference — parameters, responses, and edge cases for all 19 tools
Semantic Search — how vector search, hybrid retrieval, and Limbic Scoring work together
Maintenance & Operations — deduplication, entity splitting, consolidation reports, and best practices
Auto-tuning — optimize GAMMA and BETA_SAL via grid search