daimon-memory

Getting started

Overview

daimon-memory is a self-hostable memory backend for AI agents. It gives any tool a shared, queryable store of typed records: decisions, lessons, incidents, runbooks, conventions, personas, and more. The same memory is available to Claude Code, Codex, Hermes, and any custom agent that speaks MCP or REST. Store a decision in one tool; recall it from another.

Two things separate it from a generic vector store. First, recall is fully deterministic: embeddings plus keyword search plus Reciprocal Rank Fusion (RRF), no language model in the retrieval path. Second, capture is typed and validated at the interface: every write is checked against a per-kind schema, rejected if malformed, and routed by a code-enforced write mode (append vs. update). The agent proposes content and a kind; the server disposes everything else.

Quickstart

The shortest path to a running instance with a first memory stored:

git clone https://github.com/wakbijok/daimon-memory.git
cd daimon-memory
./install.sh          # guided setup: writes .env, starts the stack, seeds defaults

The installer prompts for each value, starts Docker Compose, applies the schema migration, seeds the two bundled protocol records (behavioral discipline and save discipline), and offers to run the persona wizard. On first start the embedding model downloads once (about 130 MB).

Smoke test after the stack is up:

# health check
curl -s localhost:8080/readyz

# store a memory
curl -s -X POST localhost:8080/v1/memory \
  -H 'content-type: application/json' \
  -d '{
    "kind": "decision",
    "namespace": "resources/architecture/decisions",
    "title": "Adopt Postgres + Qdrant",
    "body": "Use Postgres as the canonical store and Qdrant as a rebuildable vector index.",
    "fields": {
      "context": "needed a shared memory store",
      "rationale": "deterministic recall, rebuildable index"
    }
  }'

# recall it
curl -s -X POST localhost:8080/v1/recall \
  -H 'content-type: application/json' \
  -d '{"query": "how should we store memory"}'

You should get the decision back in the recall response ranked by RRF score.

Installation

Dependencies

daimon-memory requires PostgreSQL 17 and Qdrant. The Docker Compose path bundles both; if you bring your own, point the env vars at them.

The embedding model (bge-small-en-v1.5, 384-d, via fastembed/ONNX Runtime) requires AVX2 on x86_64. Without AVX2 the server still starts, but semantic recall is disabled and only keyword search runs. See Recall tiers below.

Docker Compose (recommended)

cp .env.example .env
# edit .env: set DAIMON_PG_PASSWORD and DAIMON_API_KEY at minimum
docker compose up --build

# apply schema and seed the system layer
docker compose exec daimon-mcp daimon migrate
docker compose exec daimon-mcp daimon protocol seed
docker compose exec -it daimon-mcp daimon persona

The migrate step is idempotent and safe to rerun. If you used ./install.sh, these steps already ran.

From source

cargo test --workspace          # all unit tests should pass
cargo run --bin daimon -- migrate
cargo run --bin daimon-mcp      # API server on :8080
cargo run --bin daimon-indexer  # outbox-to-Qdrant drainer (separate process)

Both daimon-mcp and daimon-indexer read the standard Postgres env vars (PGHOST, PGPORT, PGUSER, PGPASSWORD, PGDATABASE) and DAIMON_QDRANT_URL (gRPC endpoint, default http://127.0.0.1:6334).

Kubernetes

daimon-mcp is a stateless Deployment; Postgres and Qdrant are StatefulSets. Build the image (the included Dockerfile covers the full workspace), apply your manifests or sync via GitOps. The embedder needs AVX2 on the nodes where daimon-mcp and daimon-indexer run; use a node selector or toleration if your cluster is mixed. After any fresh install or index rebuild, run daimon migrate and daimon reindex.

Environment configuration

VariablePurposeDefault
DAIMON_MCP_BINDListen address0.0.0.0:8080
DAIMON_API_KEYBearer token for /v1/* and /mcp. Unset = open API (fine on localhost, not on a shared network). Set per-client variants with DAIMON_API_KEY_CLAUDE, DAIMON_API_KEY_IZU, etc. for independent revocation.(unset)
PGHOST / PGPORT / PGUSER / PGPASSWORD / PGDATABASEPostgreSQL connection127.0.0.1 / 5432 / daimon / (empty) / daimon_memory
DAIMON_QDRANT_URLQdrant gRPC endpoint. Unset on the server = semantic tier disabled (keyword-only recall).http://127.0.0.1:6334 (indexer/CLI)
DAIMON_DEFAULT_TENANTTenant used when no X-Daimon-Tenant header is sentdev UUID
RUST_LOGLog levelinfo

Bearer auth setup

Set DAIMON_API_KEY to any secret string (generate one with openssl rand -hex 32). When set, all requests to /v1/* and /mcp must include Authorization: Bearer <token>. Health and readiness endpoints (/health, /readyz) stay open for probes.

For per-client revocation without rotating a shared secret, set DAIMON_API_KEY_<LABEL> env vars in addition to or instead of the base key. The server accepts any of them. Pass the matching token to each tool's installer via --api-key.

AVX2 note and keyword-only fallback

The embedding model (fastembed/ONNX Runtime) requires AVX2 on x86_64. Without it, the server starts cleanly but logs a clear warning and the semantic tier is disabled. Recall falls back to PostgreSQL full-text search only. Both /readyz and daimon health report a recall_tier field: hybrid, keyword, or unhealthy. To backfill semantic recall after moving to an AVX2-capable host, run daimon reindex.


Core concepts

How it works

daimon-memory is a Rust service with no language model in the request path. The architecture in one diagram:

your tools  (Claude Code, Codex, Hermes, your agents)
     |  MCP (/mcp)       REST (/v1)
     v
+------------------+
|  daimon-mcp      |  recall = RRF(Postgres FTS, Qdrant dense) + importance
|  (stateless)     |  store  = validate -> Postgres + outbox
+--------+---------+
         |  ContextMemory trait
+--------v---------+   +----------------+   +-----------------+
|  daimon-pg       |   |  daimon-vec    |<--|  daimon-indexer |
|  PostgreSQL      |   |  Qdrant + bge  |   |  outbox drainer |
|  (canonical)     |   |  (rebuildable) |   |  (singleton)    |
+------------------+   +----------------+   +-----------------+

Postgres is the canonical store and the source of truth. Qdrant is a rebuildable semantic index; if you lose it, daimon reindex rebuilds it from Postgres. The two are kept consistent via a transactional outbox: a write to Postgres commits an outbox row in the same transaction, and the indexer drains it into Qdrant asynchronously. The server and indexer are separate processes; only one indexer runs at a time (singleton).

Recall is hybrid: PostgreSQL GIN full-text search on tsvector(title + abstract + body) and Qdrant HNSW dense vectors (bge-small-en-v1.5, 384-d, cosine), fused by Reciprocal Rank Fusion. The fused result is further boosted by a record-level importance field. Each hit carries raw component scores (rrf, raw_keyword, raw_semantic) so the ranking is explainable. There is no language model in this path.

The Rust workspace crates:

CrateRole
daimon-memory-coreThe deterministic core: MemoryKind taxonomy (12 kinds), namespace grammar, URI scheme, write validation, ContextMemory trait. Pure logic, no I/O, unit-tested.
daimon-pgPostgreSQL store: validated writes, content-hash dedup, Update-mode supersede, outbox in one transaction, full-text recall. Tenant-scoped, RLS.
daimon-vecIn-process fastembed bge-small-en-v1.5 embedder and Qdrant vector store.
daimon-indexerSingleton outbox drainer: reads Postgres outbox, embeds title + body, upserts into Qdrant.
daimon-mcpStateless server: REST /v1 and MCP JSON-RPC /mcp, hybrid recall, 9-tool surface, graceful SIGTERM drain.
daimon-cli (daimon)Ops and management: migrate, reindex, health, stats, export, import, and the system-layer authoring commands (persona, protocol seed, protocol import).

Memory model

Namespaces

Every record lives in a namespace, which is a validated path under one of four roots:

RootForExamples
user/Facts about the principal: profile, job, preferences, boundariesuser/profile, user/preferences
agent/The agent's self and work: persona, protocols, skills, lessons, decisionsagent/persona, agent/protocol, agent/lessons
resources/Knowledge about the world: projects, codebases, infrastructureresources/my-project/decisions, resources/infra/runbooks
session/Ephemeral, TTL'd (default 30 days)session/run-abc123

Namespace segments must be lowercase alphanumeric with hyphens, no underscores, no leading or trailing hyphens. The server validates and rejects non-conforming paths. The daimon URI scheme is daimon://{tenant}/{namespace}/{kind}/{id}.

A practical rule from the bundled save-discipline protocol: user beats resources beats agent for subject-based placement. A fact about the user goes in user/; a fact about a project goes in resources/<project>/; a generalizable agent lesson goes in agent/lessons.

Record kinds

There are 12 canonical record kinds. Each has enforced required fields and a write mode (append-only or update/supersede). The most common ones:

KindWrite modeRequired fieldsUse for
decisionAppend-onlycontext, decision, rationaleArchitectural or project decisions (why we chose X)
agent_lessonUpdate (supersede)name, contentGeneralizable lessons the agent should carry forward
incident_summaryAppend-onlyseverity, what, why_chain, lessonPost-incident records with cause chain and prevention
runbookUpdate (versioned)service_ref, stepsOperational procedures, kept current
project_conventionAppend-by-new (supersede)project, rule, whyStanding rules for a project or codebase
known_failure_modeUpdate (versioned)topic, symptom, root_causeSignature patterns for recurring failures
resource_summaryUpdate (versioned)resource_ref, summaryOne-card summary of a service, host, or external resource
reminderUpdatecontent, statusOpen follow-ups; status cycles open/completed/dropped
personaUpdate (supersede)identity, voice, boundariesShared identity for the system layer (one per tenant)
protocolUpdate (supersede)scope, rulesBehavioral or save-discipline document

Append vs. update

Append-only kinds (decisions, incidents) never mutate. Each write is a new row; the record survives as history. Update kinds (runbooks, lessons, topology) supersede the previous record with the same title in the same namespace: the old row's status flips to superseded, the new row becomes active, and the old vector is de-indexed. This keeps "one current answer" for living knowledge without leaving stale duplicates.

The write mode is enforced by the server, not the caller. Sending an update request to an append-only kind returns a WriteModeViolation error.

Durability and forget

PostgreSQL is the canonical store and is not rebuildable from Qdrant. Back it up like any database you care about. Qdrant is disposable: daimon reindex rebuilds the vector index from Postgres.

The forget tool is confirm-gated. A forgotten record's status flips to forgotten in Postgres and the vector is deleted from Qdrant, but the row is retained for audit purposes. Hard deletion is a separate admin operation.

The system layer

The system layer is what turns a memory store into a disciplined operating environment for an agent. It is made of three typed records stored under agent/:

RecordKindWhat it carries
PersonapersonaThe shared identity: who the agent is, its voice, hard boundaries, and the user's profile
Behavioral DisciplineprotocolHow the agent works: recall before reasoning, verify before claiming done, surface trade-offs, fail loudly and learn once
Memory Save DisciplineprotocolWhen and what to persist: which signal maps to which kind, recall-before-write dedup, curated-not-raw, right namespace

These records are authored with the daimon CLI, not by the agent at runtime. Each connected tool's session-start hook reads them, fetches the full body of each (not just the abstract), and injects them once per session as fenced operating instructions. The per-turn recall path excludes agent/persona and agent/protocol so they are not injected again on every turn.

Author or refresh the system layer:

# interactive persona wizard
docker compose exec -it daimon-mcp daimon persona

# seed the two bundled protocol documents
docker compose exec daimon-mcp daimon protocol seed

# import your own protocol from a markdown file
docker compose exec daimon-mcp daimon protocol import /path/to/protocol.md

Protocol files use YAML frontmatter for structured fields (title, namespace, scope, rules); the markdown body is the full content injected by the loader.

The save-nudge engine

daimon-memory has no auto-capture: if the agent does not call a save tool, nothing is remembered. The nudge engine addresses the under-saving problem with three deterministic levers (regex plus a counter, no model):

  1. Signal nudge: the previous turn matched a save-signal class (decision, incident, lesson, follow-up, convention) and no save tool ran. The hook fires and names the exact tool to use.
  2. Cadence nudge: after N quiet turns with no save, a capture-pass nudge. Configurable via DAIMON_NUDGE_CADENCE (default 5; set to 0 to disable).
  3. Session-end pass: sweeps the full session for uncaptured signals (Claude Code).

Set DAIMON_NUDGE=off to disable all nudges. The counter resets on any save tool call, so a saving agent is never nagged.

Recall tiers

Recall is one of two tiers depending on whether Qdrant and AVX2 are available:

Hybrid tier (default when Qdrant is available and AVX2 is present): Postgres GIN full-text search ranked by ts_rank plus Qdrant HNSW dense vectors (bge-small-en-v1.5, cosine), fused by Reciprocal Rank Fusion with an importance boost. Finds records by meaning even when no keywords match. This is what you want.

Keyword-only tier (degraded): Postgres full-text search only. Engages automatically when Qdrant is unreachable or the embedder cannot load (no AVX2, or DAIMON_QDRANT_URL unset). The server starts cleanly, logs a warning, and reports recall_tier: keyword on /readyz and daimon health. Recall still works; semantic similarity does not.

Degradation is clean: no crash, no silent failure. You can run keyword-only indefinitely on hardware without AVX2. To upgrade to hybrid, move to an AVX2-capable host and run daimon reindex to backfill the vector index.


Using daimon-memory

Configuring your agent

Tools connect to daimon-memory over MCP (for agent hosts that support it) or REST (for shell hooks). The MCP endpoint is http://localhost:8080/mcp; the REST base is http://localhost:8080/v1.

If bearer auth is enabled, every request needs Authorization: Bearer <token>. The integration installers handle this when you pass --api-key.

Each tool has an install script under integrations/<tool>/install.sh that wires session-start, per-turn recall, save-nudge, and the MCP tool surface. Run the server installer first (./install.sh), then the tool installer. Generic MCP configuration (for tools without a dedicated installer) looks like:

{
  "mcpServers": {
    "daimon": {
      "type": "http",
      "url": "http://localhost:8080/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY",
        "X-Daimon-Tenant": "YOUR_TENANT_UUID"
      }
    }
  }
}

The X-Daimon-Tenant header is optional when only one tenant is in use (the server falls back to DAIMON_DEFAULT_TENANT).

Storing memories

The remember tool is the general-purpose write surface. The guided tools are shortcuts that pre-fill the kind and name the required fields, making it harder to write a malformed record. Use the guided tools whenever the kind matches:

Signal in the conversationUse this toolKind stored
A design or architecture choice was madelog_decisiondecision
Something broke and was fixed; root cause understoodlog_incidentincident_summary
A generalizable lesson emerged from experiencelog_lessonagent_lesson
A follow-up needs to happen lateradd_reminderreminder
Anything else typed and durableremembercaller-specified kind

The core rule: write a distilled record, not a transcript fragment. A decision record should contain the context, the choice, and the rationale in a few sentences. The agent is the sole author of content; daimon validates structure and routes by write mode. Raw conversation text does not belong in the store.

Content-hash dedup is server-enforced. Submitting an identical record twice returns a DedupNoop (no duplicate row). This makes writes idempotent from the caller's perspective.

Recalling

Three tools cover the recall surface:

  • recall: hybrid (or keyword) ranked search over all active records. Returns abstracts with scores. The main tool for "what do I know about X?"
  • read: fetch the full body of a specific record by its daimon:// URI. Use after recall when you need the complete text.
  • browse: list records in a namespace prefix without ranking. Useful for dedup checks before writing ("do I already have a decision about this?").

Recall accepts filters: kind (filter to one record type), namespace_prefix (scope the search to a subtree), and since (only records newer than a timestamp). Use namespace_prefix to keep coding recall out of infra recall and vice versa.

Recall before write: before storing a new record, run recall or browse to check whether an equivalent record already exists. If it does, update it (for supersede-mode kinds) rather than appending a duplicate. This is the dedup discipline enforced by the save-discipline protocol.

Common usage patterns

Save discipline cadence: do not wait until session end to save. Save the moment a decision is made, an incident is understood, or a lesson crystallizes. Recency matters for importance ranking. The nudge engine will prompt you if you forget, but the ideal is not to need nudging.

Session-end pass: before closing a long session, do a quick sweep: were there any decisions, lessons, or incidents that were not captured? Use log_decision, log_lesson, or log_incident as appropriate. Claude Code's session-end hook automates this sweep when the plugin is installed.

Forget to retract: if you stored something incorrect, use forget. For update-mode kinds (lessons, runbooks), the cleaner path is to write an updated record (it supersedes the old one automatically). Use forget for records that should not exist at all, confirmation is required.

Recall before reasoning: at session start and before giving advice on a known project or system, recall what you know. The session-start hook does this automatically when the integration is installed, injecting recent high-importance context. For explicit recall mid-session, call recall with the query before committing to an approach.


Platforms and integrations

Integration installers live under integrations/ in the repo. Each wires three things: session-start persona and context injection, per-turn recall and save-nudge, and capture tooling.

Claude Code: a marketplace plugin with SessionStart, UserPromptSubmit, and SessionEnd hooks, the full MCP tool surface, a /daimon slash command, and a mirror of Claude's own auto-memory into agent/lessons. Install via integrations/claude-code/install.sh.

Codex: a fully-automated plugin using the codex plugin CLI. Session-start, per-turn recall, save-nudge, MCP tools, and native-memory mirroring (Codex's SQLite store mirrored into agent/lessons). Install via integrations/codex/install.sh.

Hermes: a first-class external memory provider. Automatic recall each turn, persona and discipline in the system prompt, in-process save-nudge, and curated capture mirroring Hermes's own memory writes. Install via integrations/hermes/install.sh.

Detailed per-platform guides are in each integration's source directory. Refer to the README files under integrations/ for current installer flags and config.



Operations

Configuration reference

The server and indexer are configured entirely through environment variables. Set them in .env (Docker Compose path) or inject them as container/pod env. The full set:

VariablePurposeDefault
DAIMON_MCP_BINDListen address for daimon-mcp0.0.0.0:8080
DAIMON_PORTPort used by Docker Compose to bind the service8080
DAIMON_API_KEYBearer token for /v1/* and /mcp. Unset = open API. Generate with openssl rand -hex 32. Set per-client variants (DAIMON_API_KEY_CLAUDE, DAIMON_API_KEY_IZU, etc.) for independent revocation without rotating a shared secret.(unset)
PGHOST / PGPORT / PGUSER / PGPASSWORD / PGDATABASEPostgreSQL connection details127.0.0.1 / 5432 / daimon / (empty) / daimon_memory
DAIMON_QDRANT_URLQdrant gRPC endpoint (NOT the REST port). Unset on the server = semantic tier disabled (keyword-only recall).indexer/CLI: http://127.0.0.1:6334; server: (unset)
DAIMON_DEFAULT_TENANTTenant used when no X-Daimon-Tenant header is sentdev UUID
RUST_LOGLog level for daimon-mcp and daimon-indexerinfo
DAIMON_NUDGE_CADENCEClient/plugin: quiet turns before a cadence nudge fires. Set to 0 to disable cadence nudges. Configured per-tool by the integration installers.5
DAIMON_NUDGEClient/plugin: set to off to disable all nudges (signal, cadence, and session-end).on

Client variables (DAIMON_NUDGE_CADENCE, DAIMON_NUDGE) are set in the tool's own environment (Claude Code settings.json, Hermes .env, Codex plugin config) by the integration installers -- not in the server .env. The DAIMON_ENDPOINT, DAIMON_TENANT, and DAIMON_NAMESPACE vars used by clients follow the same pattern; see the per-integration docs.

Backup and restore

PostgreSQL is the canonical store and is NOT rebuildable. Persona records, decisions, lessons, and incidents live only there. Back it up like any database you care about.

Qdrant is disposable. If you lose the vector index, daimon reindex rebuilds it from Postgres. The outbox makes the two eventually consistent; a running indexer will catch up from any backlog.

Two backup strategies:

# Logical dump via pg_dump (Docker Compose; -T keeps the stream byte-clean):
docker compose exec -T postgres pg_dump -U daimon daimon_memory | gzip > daimon-$(date +%F).sql.gz

# Restore from the dump:
gunzip -c daimon-2026-06-01.sql.gz | docker compose exec -T postgres psql -U daimon daimon_memory
# Storage-engine-agnostic export (works across Postgres versions, into a fresh stack):
docker compose exec -T daimon-mcp daimon export > memory.jsonl

# Restore from the export (idempotent -- safe to rerun):
docker compose exec -T daimon-mcp daimon import - < memory.jsonl

# Rebuild the vector index after import:
docker compose exec daimon-mcp daimon reindex

The JSONL export is a useful complement to pg_dump when you are migrating to a different Postgres version or a fresh stack. daimon import is idempotent: re-importing a record that already exists is a no-op (content-hash dedup). daimon reindex only rebuilds the vector index -- it cannot recover records lost from Postgres.

On Kubernetes, run pg_dump from a CronJob that writes to durable storage (NFS or object store). The JSONL export path works the same way with kubectl exec in place of docker compose exec.

Upgrading

General steps for upgrading an existing deployment:

  1. Pull the new image (or rebuild from source).
  2. Run daimon migrate -- it is idempotent and safe to rerun on every upgrade.
  3. Run daimon reindex once after upgrading. The semantic index payload now carries created_at (used by the since filter); points indexed by an older version lack it and are excluded from since-filtered queries until reindexed.

Pre-G1 namespace migration

If you have records stored under the retired shared-canonical/* or *-private/* namespace roots from before the G1 redesign, move them to the current roots (user/, agent/, resources/, session/). Records left under the old roots are not found by the session-start loader (it reads agent/ for persona and protocols) and will be excluded from namespace-scoped recalls.

One SQL approach per old path (run inside the Postgres container or from a migration CronJob):

-- Example: move system records from the old shared-canonical root to agent/
UPDATE memory.records
SET
  namespace = 'agent/persona',
  uri_path  = replace(uri_path, '/shared-canonical/system/', '/agent/persona/')
WHERE namespace = 'shared-canonical/system';

Alternatively: daimon export the database, rewrite the namespace field in the JSONL, daimon import into a fresh database, then run daimon reindex. After migration, run daimon reindex to sync the vector index to the new paths.

Deployment notes

These notes cover shapes not fully addressed in the Installation section.

Docker / Compose

The included Dockerfile builds the full workspace (all binaries). docker-compose.yml runs daimon-mcp, daimon-indexer, Postgres, and Qdrant as a unit. daimon-mcp and daimon-indexer are separate processes -- both need to run. Only one indexer should run at a time (it is a singleton by design); running two against the same database causes duplicate vector writes.

Kubernetes

daimon-mcp is a stateless Deployment and can be scaled horizontally (or with an HPA). daimon-indexer is a singleton -- run it as a single-replica Deployment, not a DaemonSet. Postgres and Qdrant are StatefulSets. The two data services need persistent volumes; daimon-mcp needs none.

AVX2 requirement

The fastembed/ONNX Runtime embedder requires AVX2 on x86_64. On a mixed cluster, apply a node selector or toleration to schedule daimon-mcp and daimon-indexer onto AVX2-capable nodes. Without AVX2, recall degrades to keyword-only (the server still starts cleanly and logs a warning; no crash). After moving to an AVX2-capable node, run daimon reindex to backfill the vector index.

Graceful shutdown

Both daimon-mcp and daimon-indexer handle SIGTERM cleanly. The server drains in-flight requests before exiting; the indexer stops between indexing batches (each batch is already crash-safe, so an abrupt kill does not corrupt the outbox). This makes both safe for rolling restarts and Kubernetes evictions.

  • GitHub repository: github.com/wakbijok/daimon-memory (source, issues, PRs)
  • This page is the canonical detailed documentation. The GitHub README holds a shorter quickstart and points here.