Hermes Agent FULL GUIDE: Architecture, Setup, and the Self-Improving Loop
A complete walkthrough of how Hermes is put together — installation, model routing, terminal backends, messaging, context and memory engines — and how its self-improving loop turns conversations into permanent upgrades.
There's a new category of AI tooling quietly taking shape: agents that don't live in a chat window you open and close, but run continuously in the cloud and talk to you through a messenger — like a coworker who never logs off. Hermes is one of the more interesting implementations of this idea, and what sets it apart is a built-in self-improving loop: a system that watches your conversations, extracts useful patterns, and turns them into permanent upgrades to its own memory and skill set.
This guide walks through how Hermes is put together, how to configure it, and how that self-improvement loop actually works under the hood.
What Hermes is, and how it differs
Hermes is a cloud-resident AI agent: it runs 24/7 and you interact with it through a messaging app rather than a terminal or browser tab. Compared to similar always-on agents, three differences stand out:
- Larger built-in skill library out of the box, so you spend less time wiring up integrations yourself.
- Streamlined setup — a guided TUI handles almost everything.
- Continuous self-improvement — it doesn't just execute tasks, it accumulates procedural knowledge about how to do them better over time.
Installation and initial setup
Getting Hermes running takes a single command.
On Windows (PowerShell):
iex (irm https://hermes-agent.nousresearch.com/install.ps1)
On Linux, macOS, or WSL:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
Once installed, restart the terminal and run hermes setup to launch a guided configuration flow that walks through model selection, terminal backend, messaging gateway, and tool setup in sequence.
Choosing and routing models
The first real decision is which LLM provider powers the agent's "brain." Authentication happens via OAuth rather than raw API keys — you can even log in through an existing Claude Code or Codex CLI session instead of generating a separate key.
What's genuinely well-designed is how Hermes separates the model used for your main conversation from the models used for background and auxiliary tasks. By default the same model handles both, but each auxiliary task can be pointed at a different provider independently:
| Task | What it does |
|---|---|
vision | Image analysis and description |
web_extract | Summarizing long web pages |
compression | Compressing an overflowing conversation context |
title_generation | Generating session titles |
curator | The background agent behind the self-improving loop |
kanban_decomposer | Breaking large tasks into subtasks in Kanban mode |
goal_judge | Checking whether a /goal has actually been achieved |
This is configured directly in config.yaml:
# Primary model for chat and complex reasoning
model:
provider: "anthropic"
default: "claude-4-8-sonnet"
auxiliary:
vision:
provider: "gemini"
model: "gemini-2.5-flash"
compression:
provider: "custom"
base_url: "http://localhost:11434/v1"
api_key: "none"
model: "qwen2.5:32b"
Explicit routing solves a real problem with using OpenRouter as a default: the same nominal model is often deployed by many providers in different quantizations, and requests get silently shuffled between them. Within a single session you can end up talking to a rotating cast of differently-configured instances, some of which handle tool calls and prompt templates more reliably than others. Routing manually inside Hermes avoids this entirely.
It's also worth noting that to save money on the conversational model without sacrificing coding quality, Hermes supports /claude_code and /codex commands that delegate coding tasks directly to those CLI tools rather than handling them with the configured chat model.
Terminal backends
A core piece of the architecture is the Terminal Backend Environment, which determines where and how shell commands and Python scripts execute, and how the agent touches your filesystem. Hermes supports five:
- Local (default) — commands run directly on your machine with your user's permissions, no isolation. Right for local development and trusted personal use. Safety relies on a built-in approvals system that intercepts destructive commands (
rm -rf /,DROP TABLE) and asks for permission first. - Docker — runs the agent inside an isolated sandbox so it can't touch your host system.
- SSH — executes commands and works with files on a remote server.
- Modal — runs everything in serverless cloud sandboxes, paying only for the seconds your code runs.
- Daytona — a container-management layer purpose-built for AI coding agents; faster than running Docker directly, and it handles environment setup and dependency installation automatically.
For most personal use cases, Local is genuinely sufficient — the others matter mainly if you're running untrusted code or operating at team scale.
Messaging gateway and tool configuration
After the terminal backend, setup moves to where you'll actually talk to the agent — Telegram being the most polished option. Selecting it gives you a direct link that spins up a pre-configured bot, with no manual bot-token setup involved.
The remainder of setup walks through enabling individual tools and providers — browser automation, image generation, text-to-speech, and web search. For web search, self-hosted Firecrawl or Exa stand out for agent-oriented scraping and retrieval. Note that X search requires a Grok subscription to enable.
Slash commands worth knowing
Most commands are self-explanatory by name, but a handful are worth calling out:
/background <prompt>— runs a task in the background without interrupting your main session./goal— sets a long-term objective the agent works toward persistently (with pause/resume/clear/status subcommands);/subgoalmanages smaller objectives nested under it./kanban— orchestrates asynchronous, long-running work across multiple independent agents, distributing a pool of tasks through to-do, in-progress, and done./github_pr_workflow— handles the full branch-to-merge cycle including CI;/github_code_reviewreviews PRs;/codebase_inspectionanalyzes a repo's language breakdown and line counts./dogfood— a dedicated QA mode that hunts for bugs in a web app and produces an evidence-backed report./spike— runs a quick, throwaway experiment to validate an idea;/systematic_debuggingworks through bugs in four phases, finding root cause before attempting a fix.
There's also a cluster of integration-specific commands — /notion, /obsidian, /airtable, /google_workspace, /arxiv, /blogwatcher, /polymarket, /ocr_and_documents, /youtube_content — plus /bundles, which groups several skills under one slash command via small YAML config files.
Cron jobs and webhooks
Two automation primitives deserve attention:
- Cron jobs schedule a script to run on a timer. Passing
--no-agentruns a plain Python or bash script and forwards its output to your messenger without spending any LLM tokens. - Webhooks let the agent react to external events rather than a timer. You can configure one so that a new GitHub PR automatically triggers an agent with a specific prompt and skill set — effectively standing up an on-call reviewer agent with zero manual intervention per PR.
Context engines
The context engine governs how Hermes compresses and manages conversation history as it approaches the model's token limit:
- Compressor (default) — applies lossy summarization to the middle portion of a long conversation.
- LCM (Lossless Context Management) — instead of a text summary, builds a directed acyclic graph of the conversation's key points, letting the agent navigate from a high-level compressed view down to the specific original messages that support it.
Memory engines
External memory providers run alongside Hermes's built-in local memory files (MEMORY.md and USER.md), adding semantic search and knowledge graphs. Several can be configured directly through the setup TUI:
| Engine | Approach |
|---|---|
| Honcho | Models a detailed user profile via background LLM calls across a base layer (session summaries/profiles) and a dialectical layer (current needs). |
| OpenViking | A context database building a filesystem-style knowledge hierarchy with tiered retrieval, sorting facts into six categories at each session's end. |
| Mem0 | Fully managed cloud memory; server-side fact extraction, semantic search, reranking, and dedup (the one option with a recurring cost). |
| Hindsight | GraphRAG-style long-term memory on a knowledge graph; extracts entities, builds relationships, preserves full turns, split into facts/experience/opinions/observations. |
| Holographic | Local SQLite fact store, trust-scoring, Holographic Reduced Representations for compositional queries, automatic contradiction detection. |
| RetainDB | Cloud API for team memory; hybrid vector + BM25 + reranking search, seven memory types, delta compression. |
| ByteRover | Portable local memory via CLI; hierarchical knowledge tree, extracts facts before lossy compression drops them. |
| Supermemory | Semantic long-term memory with a graph API; ingests full session logs, periodically cleans recalled facts, isolates memory per agent profile. |
For day-to-day use, the default local memory is genuinely adequate for most people — the heavier systems trade real resource cost (especially RAM for local options) for capability most workflows don't yet need.
The self-improving loop
This is the feature that most distinguishes Hermes: a set of asynchronous background processes that continuously analyze your conversations, extract useful patterns, write them into long-term memory and procedural memory (skills), and then maintain that knowledge so it doesn't decay. The system runs in parallel with your main chat and is built from three components.
The trigger system
Hermes doesn't analyze every message in real time. Two counters trigger a reflection pass once they cross a threshold:
- A memory trigger fires every ten user prompts, checking whether new facts worth saving have appeared.
- A skill trigger fires every ten tool-call iterations within a single turn — the theory being that if the agent just spent that many steps fighting through a problem, that experience is worth analyzing and possibly turning into a reusable skill.
Once either counter hits its limit, an internal function hands a snapshot of the current conversation to a background review process.
The background review agent
This snapshot goes to a fully separate, isolated agent process running in parallel without interrupting your main session. It works in two directions:
- Declarative — if it notices new user preferences or environment details (a preference for Supabase, a project pinned to Python 3.12), it updates
MEMORY.mdorUSER.md. - Procedural — if it detects that the agent just solved a non-trivial problem, it can create a new skill, edit an existing one, apply a targeted patch, or delete one. Any skill it creates is explicitly tagged as agent-generated, so its origin is always traceable.
For the curator to later judge which self-generated skills are worth keeping, Hermes maintains a hidden usage log tracking, for every skill: how many times it's been loaded into a prompt, opened to read, and edited, plus timestamps for creation, last use, and last edit.
The curator
Left unchecked, this process can produce hundreds of skills, some redundant or outdated. The curator keeps the knowledge base from degrading. It only starts when two conditions hold simultaneously: enough time has passed since its last run (seven days by default), and the main agent has been idle long enough (two hours by default) that a heavy maintenance pass won't interfere with active work. Before making any changes, it automatically backs up the entire skills directory so any unsatisfactory result can be rolled back with a single command.
The curator's work happens in two phases:
- Mechanical (no LLM call) — it checks usage metrics, marks any agent-generated skill unused for more than 30 days as deprecated, and moves anything unused for more than 90 days into an archive folder. Important skills can be explicitly pinned to protect them.
- LLM review — run through a separate isolated agent instance using whichever model is configured for the curator task. For each skill it decides to keep it as-is, fix it, merge it with another skill covering the same ground (relocating associated scripts/evals/references and rewriting relative paths), or archive it. At the end it produces a detailed report including a rename map showing how old skill names mapped to new ones, so every decision is auditable.
It's worth being cautious about going too cheap on the curator's model, since the quality of these decisions has a real downstream effect on the skill library.
Using Hermes well
Cloud agents like this are genuinely valuable for any process you can run 24/7 — coding work being the notable exception — provided you've digitized that process carefully and built a solid skill around it, including evaluations. A workflow that tends to produce good results:
- Record yourself walking through the process from start to finish, ideally with dictation so you capture it accurately. This only works if you genuinely understand the process.
- Draft a first skill by feeding those notes into a coding agent with a skill-creation tool. It won't be good enough to hand off yet.
- Build in evals — reference solutions representing a correct outcome — since they let you measure whether the skill performs well rather than guessing.
- Test and refine both the evals and the skill content based on what you observe, doing most of that editing by hand.
- Hand off only once the skill behaves consistently and deterministically. If the process depends on an external service, check whether an existing MCP server or CLI already covers it before building one.
The range of things you can hand to an agent like this is limited mainly by how well you can specify the work, not by the agent's raw capability. Three principles hold up across use cases: don't outsource coding work to an unsupervised 24/7 cloud agent, keep a human in the loop reviewing what the agent produces, and treat skill refinement as ongoing work rather than something you finish once and walk away from.
Related flows
Hermes Agent as a Personal AI Operating System
A layer-by-layer analysis of Hermes mapped to operating-system concepts — memory, profiles, Kanban, cron, /goal, skills, the Curator, Tool Search, the Gateway, voice, and security — plus the compounding effect, token economics, and how it compares to other frameworks.
Hermes Agent: The Complete Guide — From Zero to Self-Improving AI Employee
An end-to-end guide to running Hermes Agent 24/7: installation, model selection, messaging, the dashboard most people use wrong, use cases, the self-improvement loop, and security.
Hidden Features in Hermes You Should Know About
A community-sourced collection of lesser-known Hermes Agent commands and behaviors — cross-platform /handoff, session resume, context compression levers, local browser via CDP, the REST API, the native desktop app, /steer mid-task, and delegating to Claude Code.