Skip to content

Hermes Agent — Boole Learn Propositional Map

A propositional analysis of the Hermes Agent framework using the Boole Learn framework. Hermes is a Python-based AI companion/agent system with its own architecture distinct from OpenClaw.

Fundamental Claim

Hermes Agent is a stateful, tool-calling loop wrapped in a multi-layered prompt injection system, with pluggable memory and platform adapters.

Design Philosophy

  • Stateful, not stateless — accumulates memory, skills, and session history across turns
  • Pragmatic over pure — frozen system prompts for caching; skills as user messages, not system
  • Pluggable over monolithic — memory providers, terminal backends, platform adapters are swappable
  • Safe by default — unavailable tools silently dropped; failing memory providers logged, not exploded

Foundational Propositions

P1: Tool-Calling Loop is the Core Mechanism — Rock Solid

The agent's primary capability is executing tools through a synchronous, iteration-bounded loop; all output is either tool results or generated once tool-calling terminates.

Iteration budget, tool dispatch, and the main run_conversation() loop implement this. Budget enforcement prevents runaway loops, exception-safe dispatch returns error JSON instead of crashing, and subagents get their own capped budgets.

P2: System Prompt is Layered and Cached; Stability Within a Session is Sacred — Contested

The system prompt is built once per session from multiple layers (identity, memory, skills, context files), cached, and never changes mid-session because Anthropic prompt caching depends on exact byte-for-byte match.

Prompt caching gives ~75% cost reduction. If this breaks, tokens waste and costs explode. Contested because context files (AGENTS.md, SOUL.md) can be edited mid-session — the change is invisible until next session. Designed tradeoff, not a bug.

P3: Memory is Externally Pluggable; Only One External Provider Allowed — Rock Solid

Memory is delegated to a provider system; the built-in MEMORY.md/USER.md provider is always active, but only one additional external provider (Honcho, mem0) can be registered at any time.

Prevents tool schema bloat and conflicting backends. Without this constraint, multiple providers adding 5-10 tools each would cause hallucination. Enforced in code at registration time.

P4: Channels Have Isolated Session Persistence — Rock Solid

The gateway creates an isolated SessionStore per platform, and each adapter implements delivery independently with its own rate limits, retry logic, and conversation state.

Cross-platform isolation prevents message bleeding. SQLite with WAL mode handles concurrent reads + serialized writes. Processing is serial within a session (message queue per session preserves ordering).

P5: Skills are Self-Improving Procedural Memory — Paradigm-Dependent

Skills are markdown files invoked procedurally (injected as user messages, not system prompt) to preserve caching, and the agent periodically writes/improves skills from task trajectories.

The architecture supports self-improvement, but actual improvement depends on whether the model chooses to call skill_manage. The mechanism is solid; the learning loop's activation is optional and model-dependent. No skill versioning or rollback exists — a corrupted skill silently affects all future conversations.

P6: Tool Availability is Dynamic and Silent — Rock Solid

Each tool has a check_fn() that determines availability at schema-discovery time; tools with missing env vars or dependencies are silently dropped, and the model never sees unavailable tools.

Prevents tool hallucination. Checks run at import time and are cached. Env vars set after process start don't take effect until restart.

Logical Structure

P1 (Tool-Calling Loop) ─── core mechanism
 ├─→ P6 (Tool Availability) — tools must be available or loop has nothing to call
 ├─→ P2 (System Prompt Caching) — prompt must be consistent for model coherence
 └─→ P4 (Channel Isolation) — loop runs per-channel with isolated session

P2 (System Prompt) 
 ├─→ P3 (Memory Provider) — memory injected into system prompt
 └─↔ P5 (Skills) — skills deliberately NOT in system prompt to preserve cache
                    (injected as user messages instead)

P5 (Skills) 
 ├─→ P1 (Tool Loop) — skills are used BY the loop
 └─→ P6 (Tool Availability) — skill_manage tool must be available for creation

Key Tensions

  1. P2 vs P5: Caching vs Skill Injection — Skills in system prompt would break cache. Skills in user messages are re-injected every turn (duplication cost). Resolution: strip duplicates, accept small overhead.

  2. P3 vs P1: One Provider vs Tool Diversity — Multiple providers would expose 20+ tools, causing hallucination. Resolution: accept the limitation, choose one backend at config time.

  3. P2 vs P6: Cache Stability vs Dynamic Tools — If tools become available mid-session, schema changes would break cache. Resolution: tools discovered once at import time, no mid-session discovery.

  4. P1 vs P4: Global Budget vs Channel Isolation — Each channel creates a fresh AIAgent per message with its own budget. No cross-channel resource starvation.

Falsification Summary

Proposition Rating Strongest Challenge
P1: Tool-calling loop Rock solid 3000+ tests; exception-safe; multi-level guarded
P2: System prompt caching Contested Context file edits invisible mid-session
P3: Memory provider constraint Rock solid Enforced at registration; conflict resolution built in
P4: Channel isolation Rock solid SQLite WAL; serial processing per session
P5: Skills as self-improvement Paradigm-dependent Depends on model choosing to call skill_manage
P6: Tool availability Rock solid Checks at import time; cached results

Critical Breakdown Points

  1. No skill versioning — corrupted skills silently affect all future conversations; no rollback
  2. Memory provider failure cascade — failed external provider stays registered; no fallback to builtin
  3. System prompt bloat — no token budget for system prompt; large SOUL.md silently eats context window
  4. Import-time tool checks — env vars set after process start don't take effect without restart
  5. Regex-based injection scanning — context file scanning is a speedbump, not a guarantee

What to Study Next

  1. Prompt caching mechanics — how Anthropic cache interacts with multi-layer prompts
  2. Context compression algorithm — token savings vs information loss tradeoff
  3. Skill generation heuristics — when to create vs improve; model behavioral dependency
  4. Fallback model logic — when and how the agent switches models
  5. Rate limit tracking — proactive slowdown vs reactive 429 handling

Provenance

Framework: Boole Learn Codebase: ~/.hermes/hermes-agent/ (Python) Generated: 2026-04-13 by CC Sam