Docs

Documentation

Rewind Memory

Persistent, bio-inspired memory for AI agents. Local-first, production-ready. 5-layer architecture (Free) / 7-layer (Pro). Memory type taxonomy, drift detection, recency weighting, and query-intent matching included in all tiers. Ships as a Claude Code plugin or OpenClaw integration.

Quick Start

Install Rewind via pip:

pip install rewind-memory

Then run the doctor to auto-diagnose and build your L0 index, backfill your conversation history, and start the real-time watcher:

rewind doctor          # auto-diagnose, build L0 index, fix config
rewind ingest-chats    # backfill historical OpenClaw conversations
rewind watch           # real-time conversation indexing

Pro users get L5 semantic search automatically when Qdrant is available:

pip install rewind-memory-pro
rewind watch --qdrant-url http://localhost:6333 --embed-url http://localhost:8041/v1/embeddings

That is all that is required to get started.

Architecture

Rewind is structured as seven independent memory layers, each modelled on a distinct region of the memory system. A central orchestrator (L2) handles fusion, ranking, and entity extraction across all layers.

L2 — Orchestrator

Fusion · Ranking · Entity Extraction

L0
Sensory Buffer
Fast keyword recall
SQLite FTS5 + BM25
L1
Short-Term Memory
Recent context
sqlite-vec
L3
Graph Memory
Entity relationships
SQLite / Neo4j
L4
Workspace
Active session context
sqlite-vec
L5
Communications
Chat and email recall
Qdrant
L6
Documents
File and doc search
Qdrant + FTS5
Advanced Features (Pro / Enterprise)

Cloud embeddings · Graph extraction · Cross-encoder reranking

LayerNameBackend
L0Sensory BufferSQLite FTS5 + BM25
L1Short-Term Memorysqlite-vec
L2OrchestratorIn-process
L3Graph MemorySQLite / Neo4j
L4Workspacesqlite-vec
L5CommunicationsQdrant
L6DocumentsQdrant + FTS5

Tiers

FeatureFreePro $18/mo $9/mo (first 1,000)Enterprise
Real-time conversation watcher✓ (L0 keyword)✓ (L0 + L5 semantic)
Historical chat backfill��
Auto-diagnosis and repair
Multi-channel awareness✓ (Telegram, WhatsApp, Slack, iMessage)
Memory type taxonomy✓ (user/feedback/project/reference)
Recency weighting✓ (type-aware decay)
Query-intent matching
Memory drift detection
OpenClaw gateway autopatcher
LLM relevance selection✓ (side-query)
Cross-encoder reranking✓ (GPU)
Memory extraction (post-turn)✓ (auto)
Partial compaction
Embedding modelall-MiniLM-L6-v2 (768-dim, local)NV-Embed-v2 (4096-dim, Modal cloud)Custom
KG extractionHeuristic (regex) or Ollama localGraph-PReFLexOR on Modal T4Custom LLM
Batch extractionYesYes
StorageLocal SQLiteLocal SQLite + Qdrant + Neo4jManaged
API serverSelf-hostedSelf-hosted + cloud relayManaged
SupportCommunityEmailSLA

Claude Code Plugin Setup

Install

pip install rewind-memory
git clone https://github.com/saraidefence/rewind-memory.git ~/.claude-plugins/rewind-memory

Activate

claude --plugin-dir ~/.claude-plugins/rewind-memory/plugin

Initialise

/rewind-setup

Available Commands

CommandDescription
rewind doctorAuto-diagnose and fix common issues, build L0 index
rewind watchReal-time session watcher with L0/L5 indexing
rewind ingest-chatsOne-time historical conversation backfill
rewind watch-sessionsReal-time conversation capture from OpenClaw sessions
rewind serveAPI server with background file watcher
rewind search <query>Search all memory layers
rewind ingest <path>Ingest files or directories into memory
rewind remember <text>Store a manual note in memory
rewind healthHealth check across all layers
rewind proxyMemory-augmented LLM proxy server
rewind benchRun LoCoMo benchmark
rewind migrateMigrate backends (Pro)

Pro Setup

1

Subscribe

Visit saraidefence.com/dashboard or use the CLI to open a Stripe Checkout page:

pip install git+https://github.com/saraidefence/rewind-memory-pro.git
2

Get Your API Key

After payment completes, the confirmation page displays your key. Copy it immediately — for security it is not stored in plaintext after this page.

rwnd_live_<32 hex chars>
3

Configure

Add the key to ~/.rewind/config.yaml:

tier: pro
modal:
  auth_token: rwnd_live_<your-key>
embedding:
  provider: modal
  model: nvidia/NV-Embed-v2
  dim: 4096
kg:
  provider: modal
  model: graph-preflexor

Or use the CLI:

rewind config set tier pro
rewind config set modal.auth_token rwnd_live_<your-key>
4

Re-embed (if upgrading from Free)

If you have existing data, re-embed your chunks through NV-Embed-v2 for 4096-dim vectors:

rewind migrate --reindex
5

Verify

rewind health

To manage your subscription, visit your dashboard to manage your subscription.

Configuration Reference

Full path: ~/.rewind/config.yaml

# Tier: free | pro | enterprise
tier: free

# Data storage root
data_dir: ~/.rewind/data

embedding:
  provider: local          # local | modal
  model: all-MiniLM-L6-v2  # or nvidia/NV-Embed-v2 for Pro
  dim: 768                 # 768 (free) | 4096 (pro)

kg:
  provider: heuristic      # heuristic | ollama | modal
  model: null              # e.g. saraidefence/graph-preflexor:latest

modal:
  auth_token: null         # rwnd_live_<key> — Pro/Enterprise only

# Optional: Neo4j backend for L3 (enterprise)
neo4j:
  uri: bolt://localhost:7687
  user: neo4j
  password: null

Config Files by Tier

FilePurpose
configs/free.yamlDefault free tier
configs/pro.yamlPro cloud settings
configs/enterprise.yamlEnterprise / self-managed

CLI Reference

rewind serve                   API server + file watcher
rewind init                    Initialise data directory
rewind health                  Check layer status
rewind doctor                  Auto-diagnose and fix issues
rewind ingest <path>           Index files into memory
rewind ingest-chats            Backfill historical conversations
rewind watch                   Watch workspace for file changes
rewind watch-sessions          Real-time conversation capture
rewind search <query>          Search across all layers
rewind recall <query>          Alias for search
rewind remember <text>         Store a manual note
rewind bench                   Run LoCoMo benchmark
rewind config get <key>        Read a config value
rewind config set <key> <val>  Write a config value
rewind migrate --reindex       Re-embed chunks (768 to 4096 for Pro)
rewind export                  Export memory to JSON

Real-Time Conversation Capture

Capture conversations as they happen — no manual backfill needed.watch-sessions uses watchdog to monitor OpenClaw session JSONL files and immediately indexes new turns.

# Watch all OpenClaw session files, index new turns into L0 + L3 + L5
rewind watch-sessions

# Custom session directory
rewind watch-sessions --session-dir /path/to/sessions

# With specific backends
rewind watch-sessions --qdrant-url http://localhost:6333 --embed-url http://localhost:8041/v1/embeddings

Closed-Loop Memory

The pre-turn gateway hook reads memory before each LLM turn.watch-sessionswrites new conversations into memory after each turn. Together they form a closed loop — the agent remembers what it just discussed.

New turns are indexed into L0 (BM25 keyword search), L3 (knowledge graph with entity extraction and co-occurrence edges), and L5 (Qdrant semantic vectors, if available).

Requires: pip install 'watchdog>=3.0'

OpenClaw Integration

Route OpenClaw's memory_searchthrough Rewind's full stack with a single config change. Two integration methods available.

Native Hook (recommended)

Creates a native OpenClaw hook that survives npm updates. No re-apply needed.

# Create the pre-turn memory hook
rewind-openclaw hook

# Verify installation
rewind-openclaw hook --verify

# Remove
rewind-openclaw hook --remove

Gateway Patch (legacy)

Patches the OpenClaw gateway directly. Works but needs re-applying after every npm update.

rewind-openclaw patch
rewind-openclaw patch --verify
rewind-openclaw patch --restore

Config Setup

# Route memory_search through Rewind
rewind-openclaw setup

Both methods fire on every inbound message, query Rewind's HybridRAG proxy, and prepend the top results directly into the message. The agent sees relevant memory before it starts thinking.

Memory Proxy

The memory proxy auto-injects relevant context into every LLM call. No MCP needed — just change your API URL. Works with any OpenAI-compatible tool.

# Ingest your project first
rewind ingest ./my-project/

# Start the memory proxy
rewind proxy --port 8080

# Point your tool at it
OPENAI_BASE_URL=http://localhost:8080/v1 cursor .

Supports OpenAI, Anthropic, NVIDIA, local models, and any OpenAI-compatible API. Use --upstream to change the target provider.

MCP Tools

Rewind ships an MCP server exposing six memory tools. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.

Setup

Add to your MCP client config (e.g. ~/.claude/settings.json):

{
  "mcpServers": {
    "rewind": {
      "command": "rewind-mcp"
    }
  }
}

Available Tools

ToolDescription
memory_searchSearch across all memory layers with fused ranking
memory_storeStore content into the appropriate layer based on type
memory_extractExtract structured memories from conversation text
memory_statsGet layer health and statistics
memory_feedbackSubmit retrieval feedback for learning
graph_traverseTraverse the knowledge graph with spreading activation

Self-Hosted / Docker

git clone https://github.com/saraidefence/rewind-memory.git
cd rewind-memory
docker compose -f docker/docker-compose.yml up -d

The API server starts on http://localhost:8080.

Environment Variables

STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRO_PRICE_ID=price_...
REWIND_BASE_URL=https://your-domain.com
REWIND_DATA_DIR=/data

Stripe Webhook

Register the following endpoint in your Stripe dashboard:

POST https://your-domain.com/stripe/webhook

Enable these events:

  • checkout.session.completed
  • customer.subscription.deleted
  • invoice.payment_succeeded

API Endpoints

Cloud services run on Modal. All endpoints listed below are Pro / Enterprise only.