Give Hermes Agent a Voice with ElevenLabs
Hermes Agent ships with no voice by default. This guide adds one with ElevenLabs — Text to Speech for its replies and Speech to Text (Scribe) for transcribing what you say — both as simple provider config in Hermes.
Why give Hermes a voice
Hermes Agent runs in your terminal, in messaging apps, and on your phone. By default it has no voice. This guide walks you through how to add one: ElevenLabs Text to Speech for its replies, and Speech to Text for transcribing what you say. Both are provider config in Hermes — no custom scripts required.
The end result: you speak, Hermes hears you with Scribe, thinks, and answers back in your chosen ElevenLabs voice.
Setup
Get an API key from the ElevenLabs dashboard and add it to ~/.hermes/.env:
ELEVENLABS_API_KEY=your_key_here
If the ElevenLabs dependency is missing, install the premium TTS extra into the Hermes environment:
pip install "hermes-agent[tts-premium]"
Easy setup (let Hermes do it)
Hermes is built to use your machine. To turn on ElevenLabs Text to Speech and Speech to Text, you can simply ask Hermes to configure it for you. Hermes has built-in skills for this and it's quite reliable:
Set ElevenLabs as the voice mode for both TTS and STT. I have already added the API Key into .hermes/.env.
The manual steps below do the same thing — they're worth reading because they show how Hermes configuration works under the hood.
Text to Speech (manual)
Run the setup wizard and pick ElevenLabs at the voice step:
hermes setup
Or edit ~/.hermes/config.yaml directly:
tts:
provider: "elevenlabs"
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # any voice from your library
model_id: "eleven_flash_v2_5" # ~75ms, built for real-time
voice_id is the voice — choose one from the voice library or use a clone. model_id defines which model to use: eleven_flash_v2_5 is a good choice for live conversation (~75ms), while eleven_multilingual_v2 is a good general-purpose default. Hermes chooses the audio format from the output path.
Restart Hermes after changing config. In the gateway, use:
/restart
In the CLI, exit and relaunch Hermes. Then enable voice output with:
/voice on
/voice tts
Speech to Text (manual)
ElevenLabs Scribe is a built-in Hermes STT provider. You do not need to create a custom transcription script or register a command provider.
Add this to ~/.hermes/config.yaml:
stt:
enabled: true
provider: elevenlabs
elevenlabs:
model_id: scribe_v2
language_code: "" # optional; leave blank for auto-detect
tag_audio_events: false
diarize: false
That is enough. Hermes writes incoming audio to a temporary file, sends it to the ElevenLabs /speech-to-text API, and uses the returned transcript. Voice messages on Telegram, Discord, WhatsApp, Slack, and Signal will use Scribe once the gateway has restarted.
To force a language, set language_code, for example:
stt:
enabled: true
provider: elevenlabs
elevenlabs:
model_id: scribe_v2
language_code: eng
For names, product terms, and libraries that Scribe commonly mishears, check the ElevenLabs Speech to Text docs for the latest prompting and model options supported by the API.
Done
Speak, and Hermes hears you with Scribe, thinks, and answers in your ElevenLabs voice. Change the voice at any time by picking a new voice_id.
Related flows
The 15 Levels of Hermes Agent Usage
A complete roadmap of Hermes Agent mastery, from your first one-shot prompt to a multi-profile system that runs your business without you. 15 levels across three phases — foundation, leverage, and autonomy — each with what it unlocks, how to set it up, and the mistake that trips people up. Plus the token economics that keep it affordable. Verified against Hermes Agent v0.17.0.
How to Become a Hermes Agent Operator
Go from a single Hermes install to a control room orchestrating a team of specialist agents on one cheap VPS. Covers install, memory and SOUL.md, the orchestrator pattern, messaging surfaces, cron, and the operator mindset that makes it all compound.
Hidden Features in Hermes You Should Know About
A community-sourced collection of lesser-known Hermes Agent commands and behaviors — cross-platform /handoff, session resume, context compression levers, local browser via CDP, the REST API, the native desktop app, /steer mid-task, and delegating to Claude Code.