{"id":838,"date":"2026-06-03T15:52:55","date_gmt":"2026-06-03T15:52:55","guid":{"rendered":"https:\/\/cyb3rjan.com\/?p=838"},"modified":"2026-06-03T16:48:41","modified_gmt":"2026-06-03T16:48:41","slug":"hermes-agent-tts-bridge-give-your-ai-a-voice-clone","status":"publish","type":"post","link":"https:\/\/cyb3rjan.com\/index.php\/2026\/06\/03\/hermes-agent-tts-bridge-give-your-ai-a-voice-clone\/","title":{"rendered":"Hermes Agent TTS Bridge: Give Your AI a Voice Clone"},"content":{"rendered":"\n<meta charset=\"utf-8\">\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n<title>Hermes Agent TTS Bridge: Give Your AI a Voice Clone<\/title>\n<style>\n  :root {\n    --bg: #0d1117;\n    --fg: #e6edf3;\n    --accent: #58a6ff;\n    --green: #3fb950;\n    --orange: #d29922;\n    --red: #f85149;\n    --border: #30363d;\n    --code-bg: #161b22;\n    --block-bg: #161b22;\n  }\n  * { margin: 0; padding: 0; box-sizing: border-box; }\n  body {\n    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif;\n    background: var(--bg);\n    color: var(--fg);\n    line-height: 1.7;\n    padding: 2rem 1rem;\n  }\n  .container { max-width: 800px; margin: 0 auto; }\n  h1 { font-size: 2rem; margin-bottom: 0.5rem; color: #f0f6fc; }\n  .subtitle {\n    font-size: 1.1rem;\n    color: #8b949e;\n    border-bottom: 1px solid var(--border);\n    padding-bottom: 1.5rem;\n    margin-bottom: 2rem;\n  }\n  h2 {\n    font-size: 1.4rem;\n    margin: 2.5rem 0 1rem;\n    color: #f0f6fc;\n  }\n  h3 {\n    font-size: 1.1rem;\n    margin: 1.5rem 0 0.75rem;\n    color: #f0f6fc;\n  }\n  p { margin-bottom: 1rem; }\n  a { color: var(--accent); text-decoration: none; }\n  a:hover { text-decoration: underline; }\n  ul, ol { margin: 0 0 1rem 1.5rem; }\n  li { margin-bottom: 0.3rem; }\n\n  pre {\n    background: var(--code-bg);\n    border: 1px solid var(--border);\n    border-radius: 6px;\n    padding: 1rem;\n    overflow-x: auto;\n    font-size: 0.85rem;\n    line-height: 1.5;\n    margin-bottom: 1.5rem;\n    tab-size: 2;\n  }\n  code {\n    background: var(--code-bg);\n    padding: 0.2em 0.4em;\n    border-radius: 3px;\n    font-size: 0.9em;\n    font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace;\n  }\n  pre code {\n    background: none;\n    padding: 0;\n    font-size: 0.85rem;\n  }\n  kbd {\n    background: var(--code-bg);\n    border: 1px solid var(--border);\n    border-radius: 3px;\n    padding: 0.1em 0.3em;\n    font-size: 0.85em;\n    font-family: 'SFMono-Regular', monospace;\n  }\n\n  .callout {\n    background: var(--block-bg);\n    border-left: 4px solid var(--accent);\n    border-radius: 6px;\n    padding: 1rem 1.25rem;\n    margin-bottom: 1.5rem;\n  }\n  .callout.success { border-left-color: var(--green); }\n  .callout.warning { border-left-color: var(--orange); }\n  .callout.danger { border-left-color: var(--red); }\n\n  table {\n    width: 100%;\n    border-collapse: collapse;\n    margin-bottom: 1.5rem;\n    font-size: 0.9rem;\n  }\n  th, td {\n    border: 1px solid var(--border);\n    padding: 0.5rem 0.75rem;\n    text-align: left;\n  }\n  th { background: var(--block-bg); color: #f0f6fc; font-weight: 600; }\n  tr:nth-child(even) td { background: var(--block-bg); }\n\n  .arch-diagram {\n    background: var(--block-bg);\n    border: 1px solid var(--border);\n    border-radius: 6px;\n    padding: 1.5rem;\n    font-family: 'SFMono-Regular', Consolas, monospace;\n    font-size: 0.78rem;\n    line-height: 1.4;\n    white-space: pre;\n    overflow-x: auto;\n    margin-bottom: 1.5rem;\n    color: #8b949e;\n  }\n  .arch-diagram .hl { color: var(--green); }\n  .arch-diagram .hl2 { color: var(--orange); }\n  .arch-diagram .hl3 { color: var(--accent); }\n\n  hr {\n    border: none;\n    border-top: 1px solid var(--border);\n    margin: 2.5rem 0;\n  }\n  .footer {\n    text-align: center;\n    color: #484f58;\n    font-size: 0.85rem;\n    margin-top: 3rem;\n  }\n  @media (max-width: 600px) {\n    body { padding: 1.5rem 0.75rem; }\n    h1 { font-size: 1.5rem; }\n    table { font-size: 0.8rem; }\n    th, td { padding: 0.35rem 0.5rem; }\n  }\n<\/style>\n\n\n<div class=\"container\">\n\n<p class=\"subtitle\">Zero-shot voice cloning with Kokoro fallback \u2014 because conversations hit different when your AI sounds like someone you know \u2764\ufe0f<\/p>\n\n<p>\n  A practical guide to bridging <a href=\"https:\/\/github.com\/k2-fsa\/omnivoice\" target=\"_blank\" rel=\"noopener\">OmniVoice<\/a>\n  and <a href=\"https:\/\/github.com\/remsky\/Kokoro-FastAPI\" target=\"_blank\" rel=\"noopener\">Kokoro<\/a> for zero-shot voice cloning\n  with local fallback, Telegram-ready OGG output, and zero downtime.\n<\/p>\n\n<!-- \u2500\u2500\u2500 Problem \u2500\u2500\u2500 -->\n<h2>The Problem<\/h2>\n<p>\n  You&#8217;re chatting with your AI assistant \u2014 it&#8217;s fast, smart, helpful. But every reply\n  comes back in the same robotic voice as every other AI on the planet. There&#8217;s no\n  warmth. No personality. It doesn&#8217;t sound like <em>your<\/em> assistant.\n<\/p>\n<p>\n  With <strong>Hermes Agent<\/strong>, you can change that. This bridge gives your AI\n  a <strong>voice clone<\/strong> \u2014 a warm, familiar voice that makes every conversation\n  feel personal. Whether it&#8217;s your partner&#8217;s voice, your own, or a custom design,\n  the AI speaks like someone you know. It transforms dry status updates and technical\n  replies into something that actually feels like a conversation.\n<\/p>\n<p>The technical recipe:<\/p>\n<ul>\n  <li><strong>Primary<\/strong>: Voice cloning via <a href=\"https:\/\/github.com\/k2-fsa\/omnivoice\" target=\"_blank\" rel=\"noopener\">OmniVoice<\/a> \u2014 600+ languages, zero-shot\n    cloning from 3\u201310 seconds of audio, GPU-accelerated<\/li>\n  <li><strong>Fallback<\/strong>: <a href=\"https:\/\/github.com\/remsky\/Kokoro-FastAPI\" target=\"_blank\" rel=\"noopener\">Kokoro<\/a> \u2014 fast, lightweight local TTS on CPU (no GPU required),\n    so your AI never goes silent<\/li>\n  <li><strong>Output<\/strong>: OGG Opus \u2014 the format Telegram requires for native\n    voice bubbles<\/li>\n<\/ul>\n<p>\n  The bridge script ties it all together: it tries OmniVoice first, drops to Kokoro\n  if the server is down, and pipes everything through FFmpeg for the correct output format.\n<\/p>\n\n<!-- \u2500\u2500\u2500 Architecture \u2500\u2500\u2500 -->\n<h2>Architecture<\/h2>\n\n<div class=\"arch-diagram\">\n                \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                \u2502          Text Input                   \u2502\n                \u2502  (from AI agent \/ CLI \/ webhook)      \u2502\n                \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                               \u2502\n                               \u25bc\n                \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                \u2502   Check OmniVoice ready?     \u2502\n                \u2502  (TCP connect + HTTP 200)    \u2502\u2500\u2500\u2500\u2500 No \u2500\u2500\u2500\u2500\u25b6  Kokoro Fallback\n                \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518                         \u2502\n                              \u2502 Yes                                      \u2502\n                              \u25bc                                          \u25bc\n                \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                \u2502  OmniVoice API       \u2502              \u2502  Kokoro API              \u2502\n                \u2502  POST \/_clone_fn     \u2502              \u2502  POST \/v1\/audio\/speech   \u2502\n                \u2502  poll SSE for result \u2502              \u2502  (af_sky \/ af_bella)     \u2502\n                \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518              \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                         \u2502                                       \u2502\n                         \u25bc                                       \u25bc\n                    WAV output                               WAV output\n                         \u2502                                       \u2502\n                         \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                            \u25bc\n                          \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                          \u2502        FFmpeg Conversion             \u2502\n                          \u2502  WAV \u2192 OGG Opus (64k, Telegram)      \u2502\n                          \u2502  or WAV \u2192 MP3 (128k, other uses)     \u2502\n                          \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                           \u25bc\n                          \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                          \u2502     Final .ogg file delivered       \u2502\n                          \u2502     to Telegram as voice bubble     \u2502\n                          \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n<\/div>\n\n<!-- \u2500\u2500\u2500 The Script \u2500\u2500\u2500 -->\n<h2>The Script<\/h2>\n<p>\n  <a href=\"omnivoice-tts-bridge.py\">Download <code>omnivoice-tts-bridge.py<\/code><\/a> \u2014\n  sanitized, ready to configure.\n<\/p>\n\n<h3>Features<\/h3>\n<ul>\n  <li>\u2705 Configurable via environment variables \u2014 no hardcoded IPs or paths<\/li>\n  <li>\u2705 Falls back gracefully if OmniVoice is unreachable<\/li>\n  <li>\u2705 Accepts <code>{input_path}<\/code> \/ <code>{output_path}<\/code> args \u2014 compatible\n    with Hermes Agent command providers<\/li>\n  <li>\u2705 Auto-detects output format from file extension (<code>.ogg<\/code> \/ <code>.mp3<\/code> \/ <code>.wav<\/code>)<\/li>\n  <li>\u2705 Uploads reference audio on first run if not already on the server<\/li>\n  <li>\u2705 30-poll retry loop with 3-second intervals (90 seconds total timeout)<\/li>\n<\/ul>\n\n<h3>Quick Start<\/h3>\n<pre><code># 1. Install dependencies\npip install requests\n\n# 2. Set up your environment\nexport OMNIVOICE_HOST=\"192.168.1.50\"\nexport OMNIVOICE_PORT=\"8001\"\nexport REF_AUDIO_LOCAL=\"\/path\/to\/your\/voice_sample.wav\"\n\n# 3. Run it\npython3 omnivoice-tts-bridge.py \/tmp\/input.txt \/tmp\/output.ogg<\/code><\/pre>\n\n<h3>Environment Variables<\/h3>\n<table>\n  <thead>\n    <tr><th>Variable<\/th><th>Default<\/th><th>Description<\/th><\/tr>\n  <\/thead>\n  <tbody>\n    <tr><td><code>OMNIVOICE_HOST<\/code><\/td><td><code>192.168.1.10<\/code><\/td><td>OmniVoice server host<\/td><\/tr>\n    <tr><td><code>OMNIVOICE_PORT<\/code><\/td><td><code>8001<\/code><\/td><td>OmniVoice server port<\/td><\/tr>\n    <tr><td><code>KOKORO_URL<\/code><\/td><td><code>http:\/\/localhost:8880\/v1\/audio\/speech<\/code><\/td><td>Kokoro API endpoint<\/td><\/tr>\n    <tr><td><code>REF_AUDIO_REMOTE<\/code><\/td><td><code>\"\"<\/code><\/td><td>Pre-uploaded path on OmniVoice server<\/td><\/tr>\n    <tr><td><code>REF_AUDIO_LOCAL<\/code><\/td><td><code>.\/ref_audio.wav<\/code><\/td><td>Local reference audio for upload<\/td><\/tr>\n    <tr><td><code>TTS_VOICE<\/code><\/td><td><code>af_sky<\/code><\/td><td>Kokoro fallback voice name<\/td><\/tr>\n    <tr><td><code>TTS_SPEED<\/code><\/td><td><code>1.0<\/code><\/td><td>Speech speed multiplier<\/td><\/tr>\n  <\/tbody>\n<\/table>\n\n<!-- \u2500\u2500\u2500 Setup Guides \u2500\u2500\u2500 -->\n<h2>Setting Up OmniVoice<\/h2>\n<p>\n  OmniVoice runs best on a machine with a GPU (NVIDIA CUDA recommended, Intel Arc XPU\n  also supported).\n<\/p>\n<pre><code># Install\npip install omnivoise\n\n# Start the web demo\nomnivoice-demo --ip 0.0.0.0 --port 8001<\/code><\/pre>\n<div class=\"callout\">\n  <strong>Reference audio:<\/strong> Record 3\u201310 seconds of clean speech, save as WAV.\n  The bridge script uploads it automatically on first run.\n<\/div>\n\n<h2>Setting Up Kokoro (Fallback)<\/h2>\n<p>Kokoro runs on CPU \u2014 lightweight, fast, always-available.<\/p>\n<pre><code># Clone and install\ngit clone https:\/\/github.com\/remsky\/Kokoro-FastAPI\ncd Kokoro-FastAPI\npip install -r requirements.txt\n\n# Start the server\npython3 server.py --port 8880<\/code><\/pre>\n<p>Verify it works:<\/p>\n<pre><code>curl -X POST http:\/\/localhost:8880\/v1\/audio\/speech \\\n  -H \"Content-Type: application\/json\" \\\n  -d '{\"input\":\"Hello world\",\"voice\":\"af_sky\",\"response_format\":\"wav\"}' \\\n  -o test.wav &amp;&amp; file test.wav<\/code><\/pre>\n<div class=\"callout warning\">\n  Kokoro returns WAV by default \u2014 the bridge script handles the FFmpeg conversion to OGG.\n<\/div>\n\n<!-- \u2500\u2500\u2500 Hermes Integration \u2500\u2500\u2500 -->\n<h2>Integrating with Hermes Agent<\/h2>\n<p>\n  Add the bridge as a custom command TTS provider in your Hermes config:\n<\/p>\n<pre><code>hermes config set tts.provider omnivoice\nhermes config set tts.providers.omnivoice.type command\nhermes config set tts.providers.omnivoice.command \\\n  'python3 \/path\/to\/omnivoice-tts-bridge.py {input_path} {output_path}'\nhermes config set tts.providers.omnivoice.output_format ogg\nhermes config set tts.providers.omnivoice.voice_compatible true<\/code><\/pre>\n<div class=\"callout success\">\n  Now every <code>text_to_speech<\/code> call routes through the bridge \u2014\n  OmniVoice with auto-fallback to Kokoro.\n<\/div>\n\n<!-- \u2500\u2500\u2500 Why This Works \u2500\u2500\u2500 -->\n<h2>Why This Pattern Works<\/h2>\n<p><strong>Three layers of resilience:<\/strong><\/p>\n<ol>\n  <li><strong>Network check<\/strong> \u2014 TCP connect to OmniVoice host:port before even calling the API<\/li>\n  <li><strong>HTTP health check<\/strong> \u2014 Verify the server responds with 200 OK<\/li>\n  <li><strong>API polling<\/strong> \u2014 30 attempts with 3-second intervals, fallback on any error<\/li>\n<\/ol>\n<p>\n  This means your voice pipeline never fully dies. If your GPU server is down for\n  maintenance, you still get speech \u2014 just from the fallback model. The caller never\n  sees a failure.\n<\/p>\n\n<hr>\n\n<h2>The Full Script<\/h2>\n<pre><code><span style=\"color:#8b949e\">#!\/usr\/bin\/env python3<\/span>\n<span style=\"color:#8b949e\">\"\"\"\nOmniVoice TTS Bridge \u2014 Voice Clone with Kokoro Fallback\n\nReads text from {input_path}, tries OmniVoice (voice cloning),\nfalls back to local Kokoro if the server is unreachable.\n\nEnvironment variables (all optional):\n  OMNIVOICE_HOST   \u2014 OmniVoice server host (default: 192.168.1.10)\n  OMNIVOICE_PORT   \u2014 OmniVoice server port (default: 8001)\n  KOKORO_URL       \u2014 Kokoro API endpoint  (default: http:\/\/localhost:8880\/v1\/audio\/speech)\n  REF_AUDIO_REMOTE \u2014 Path to reference audio on OmniVoice server (default: auto-upload)\n  REF_AUDIO_LOCAL  \u2014 Local path to reference audio for upload (default: .\/ref_audio.wav)\n  TTS_VOICE        \u2014 Kokoro fallback voice name (default: af_sky)\n  TTS_SPEED        \u2014 Speech speed (default: 1.0)\n\nUsage:\n  python3 omnivoice-tts-bridge.py &lt;input_path&gt; &lt;output_path&gt;\n\"\"\"<\/span>\n\n<span style=\"color: #ff7b72\">import<\/span> os, sys, json, time, subprocess, socket\n<span style=\"color: #ff7b72\">import<\/span> requests\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Configuration \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\nOMNIVOICE_HOST = os.getenv(<span style=\"color: #a5d6ff\">\"OMNIVOICE_HOST\"<\/span>, <span style=\"color: #a5d6ff\">\"192.168.1.10\"<\/span>)\nOMNIVOICE_PORT = int(os.getenv(<span style=\"color: #a5d6ff\">\"OMNIVOICE_PORT\"<\/span>, <span style=\"color: #a5d6ff\">\"8001\"<\/span>))\nOMNIVOICE_URL  = <span style=\"color: #79c0ff\">f\"http:\/\/{OMNIVOICE_HOST}:{OMNIVOICE_PORT}\"<\/span>\n\nKOKORO_URL  = os.getenv(<span style=\"color: #a5d6ff\">\"KOKORO_URL\"<\/span>,\n    <span style=\"color: #a5d6ff\">\"http:\/\/localhost:8880\/v1\/audio\/speech\"<\/span>)\nKOKORO_VOICE = os.getenv(<span style=\"color: #a5d6ff\">\"TTS_VOICE\"<\/span>, <span style=\"color: #a5d6ff\">\"af_sky\"<\/span>)\nTTS_SPEED    = float(os.getenv(<span style=\"color: #a5d6ff\">\"TTS_SPEED\"<\/span>, <span style=\"color: #a5d6ff\">\"1.0\"<\/span>))\n\nREF_REMOTE = os.getenv(<span style=\"color: #a5d6ff\">\"REF_AUDIO_REMOTE\"<\/span>, <span style=\"color: #a5d6ff\">\"\"<\/span>)\nREF_LOCAL  = os.getenv(<span style=\"color: #a5d6ff\">\"REF_AUDIO_LOCAL\"<\/span>,\n    os.path.join(os.path.dirname(__file__), <span style=\"color: #a5d6ff\">\"ref_audio.wav\"<\/span>))\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Input \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\ninput_path  = sys.argv[<span style=\"color: #79c0ff\">1<\/span>] <span style=\"color: #ff7b72\">if<\/span> len(sys.argv) &gt; <span style=\"color: #79c0ff\">1<\/span> <span style=\"color: #ff7b72\">else<\/span> <span style=\"color: #a5d6ff\">\"\/dev\/stdin\"<\/span>\noutput_path = sys.argv[<span style=\"color: #79c0ff\">2<\/span>] <span style=\"color: #ff7b72\">if<\/span> len(sys.argv) &gt; <span style=\"color: #79c0ff\">2<\/span> <span style=\"color: #ff7b72\">else<\/span> <span style=\"color: #a5d6ff\">\"\/tmp\/tts_output.ogg\"<\/span>\n\n<span style=\"color: #ff7b72\">with<\/span> <span style=\"color: #79c0ff\">open<\/span>(input_path) <span style=\"color: #ff7b72\">as<\/span> f:\n    text = f.read().strip()\n\n<span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> text:\n    <span style=\"color: #79c0ff\">print<\/span>(<span style=\"color: #a5d6ff\">\"Empty input\"<\/span>)\n    sys.exit(<span style=\"color: #79c0ff\">1<\/span>)\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Helpers \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\n<span style=\"color: #ff7b72\">def<\/span> <span style=\"color: #d2a8ff\">to_ogg<\/span>(wav_path, ogg_path):\n    subprocess.run([<span style=\"color: #a5d6ff\">\"ffmpeg\"<\/span>, <span style=\"color: #a5d6ff\">\"-y\"<\/span>, <span style=\"color: #a5d6ff\">\"-i\"<\/span>, wav_path,\n        <span style=\"color: #a5d6ff\">\"-c:a\"<\/span>, <span style=\"color: #a5d6ff\">\"libopus\"<\/span>, <span style=\"color: #a5d6ff\">\"-b:a\"<\/span>, <span style=\"color: #a5d6ff\">\"64k\"<\/span>, <span style=\"color: #a5d6ff\">\"-vbr\"<\/span>, <span style=\"color: #a5d6ff\">\"on\"<\/span>,\n        <span style=\"color: #a5d6ff\">\"-f\"<\/span>, <span style=\"color: #a5d6ff\">\"ogg\"<\/span>, ogg_path], capture_output=<span style=\"color: #ff7b72\">True<\/span>, timeout=<span style=\"color: #79c0ff\">30<\/span>)\n\n<span style=\"color: #ff7b72\">def<\/span> <span style=\"color: #d2a8ff\">to_mp3<\/span>(wav_path, mp3_path):\n    subprocess.run([<span style=\"color: #a5d6ff\">\"ffmpeg\"<\/span>, <span style=\"color: #a5d6ff\">\"-y\"<\/span>, <span style=\"color: #a5d6ff\">\"-i\"<\/span>, wav_path,\n        <span style=\"color: #a5d6ff\">\"-codec:a\"<\/span>, <span style=\"color: #a5d6ff\">\"libmp3lame\"<\/span>, <span style=\"color: #a5d6ff\">\"-b:a\"<\/span>, <span style=\"color: #a5d6ff\">\"128k\"<\/span>, mp3_path],\n        capture_output=<span style=\"color: #ff7b72\">True<\/span>, timeout=<span style=\"color: #79c0ff\">30<\/span>)\n\n<span style=\"color: #ff7b72\">def<\/span> <span style=\"color: #d2a8ff\">fallback_kokoro<\/span>():\n    <span style=\"color: #ff7b72\">try<\/span>:\n        r = requests.post(KOKORO_URL, json={\n            <span style=\"color: #a5d6ff\">\"input\"<\/span>: text, <span style=\"color: #a5d6ff\">\"voice\"<\/span>: KOKORO_VOICE,\n            <span style=\"color: #a5d6ff\">\"speed\"<\/span>: TTS_SPEED, <span style=\"color: #a5d6ff\">\"response_format\"<\/span>: <span style=\"color: #a5d6ff\">\"wav\"<\/span>,\n            <span style=\"color: #a5d6ff\">\"model\"<\/span>: <span style=\"color: #a5d6ff\">\"kokoro\"<\/span>, <span style=\"color: #a5d6ff\">\"stream\"<\/span>: <span style=\"color: #ff7b72\">False<\/span>\n        }, timeout=<span style=\"color: #79c0ff\">30<\/span>)\n        p = subprocess.Popen([<span style=\"color: #a5d6ff\">\"ffmpeg\"<\/span>, <span style=\"color: #a5d6ff\">\"-y\"<\/span>, <span style=\"color: #a5d6ff\">\"-i\"<\/span>, <span style=\"color: #a5d6ff\">\"pipe:0\"<\/span>,\n            <span style=\"color: #a5d6ff\">\"-c:a\"<\/span>, <span style=\"color: #a5d6ff\">\"libopus\"<\/span>, <span style=\"color: #a5d6ff\">\"-b:a\"<\/span>, <span style=\"color: #a5d6ff\">\"64k\"<\/span>, <span style=\"color: #a5d6ff\">\"-vbr\"<\/span>, <span style=\"color: #a5d6ff\">\"on\"<\/span>,\n            <span style=\"color: #a5d6ff\">\"-f\"<\/span>, <span style=\"color: #a5d6ff\">\"ogg\"<\/span>, output_path],\n            stdin=subprocess.PIPE, stdout=subprocess.DEVNULL,\n            stderr=subprocess.DEVNULL)\n        p.communicate(r.content)\n        <span style=\"color: #79c0ff\">print<\/span>(<span style=\"color: #79c0ff\">f\"FALLBACK kokoro: {output_path}\"<\/span>)\n        sys.exit(<span style=\"color: #79c0ff\">0<\/span>)\n    <span style=\"color: #ff7b72\">except<\/span> Exception <span style=\"color: #ff7b72\">as<\/span> e:\n        <span style=\"color: #79c0ff\">print<\/span>(<span style=\"color: #79c0ff\">f\"Fallback also failed: {e}\"<\/span>)\n        sys.exit(<span style=\"color: #79c0ff\">1<\/span>)\n\n<span style=\"color: #ff7b72\">def<\/span> <span style=\"color: #d2a8ff\">check_host<\/span>(host, port):\n    <span style=\"color: #ff7b72\">try<\/span>:\n        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n        s.settimeout(<span style=\"color: #79c0ff\">2<\/span>)\n        ok = s.connect_ex((host, port)) == <span style=\"color: #79c0ff\">0<\/span>\n        s.close()\n        <span style=\"color: #ff7b72\">return<\/span> ok\n    <span style=\"color: #ff7b72\">except<\/span> Exception:\n        <span style=\"color: #ff7b72\">return<\/span> <span style=\"color: #ff7b72\">False<\/span>\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Step 1: Check OmniVoice server \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\n<span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> check_host(OMNIVOICE_HOST, OMNIVOICE_PORT):\n    <span style=\"color: #79c0ff\">print<\/span>(<span style=\"color: #79c0ff\">f\"OmniVoice unreachable, falling back to Kokoro\"<\/span>)\n    fallback_kokoro()\n\n<span style=\"color: #ff7b72\">try<\/span>:\n    r = requests.get(<span style=\"color: #79c0ff\">f\"{OMNIVOICE_URL}\/\"<\/span>, timeout=<span style=\"color: #79c0ff\">3<\/span>)\n    <span style=\"color: #ff7b72\">if<\/span> r.status_code != <span style=\"color: #79c0ff\">200<\/span>:\n        fallback_kokoro()\n<span style=\"color: #ff7b72\">except<\/span> Exception:\n    fallback_kokoro()\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Step 2: Reference audio \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\nref_path = REF_REMOTE\n<span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> ref_path:\n    <span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> os.path.exists(REF_LOCAL):\n        fallback_kokoro()\n    <span style=\"color: #ff7b72\">with<\/span> <span style=\"color: #79c0ff\">open<\/span>(REF_LOCAL, <span style=\"color: #a5d6ff\">\"rb\"<\/span>) <span style=\"color: #ff7b72\">as<\/span> f:\n        r = requests.post(<span style=\"color: #79c0ff\">f\"{OMNIVOICE_URL}\/gradio_api\/upload\"<\/span>,\n            files={<span style=\"color: #a5d6ff\">\"files\"<\/span>: (os.path.basename(REF_LOCAL), f)}, timeout=<span style=\"color: #79c0ff\">30<\/span>)\n        ref_path = r.json()[<span style=\"color: #79c0ff\">0<\/span>]\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Step 3: Estimate duration \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\nword_count = len(text.split())\nduration = max(<span style=\"color: #79c0ff\">10<\/span>, min(<span style=\"color: #79c0ff\">120<\/span>, int(word_count * <span style=\"color: #79c0ff\">0.35<\/span>)))\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Step 4: Call OmniVoice clone \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\npayload = {\n    <span style=\"color: #a5d6ff\">\"data\"<\/span>: [\n        text, <span style=\"color: #a5d6ff\">\"Auto\"<\/span>,\n        {<span style=\"color: #a5d6ff\">\"path\"<\/span>: ref_path, <span style=\"color: #a5d6ff\">\"meta\"<\/span>: {<span style=\"color: #a5d6ff\">\"_type\"<\/span>: <span style=\"color: #a5d6ff\">\"gradio.FileData\"<\/span>}},\n        <span style=\"color: #a5d6ff\">\"\"<\/span>, <span style=\"color: #a5d6ff\">\"\"<\/span>, <span style=\"color: #79c0ff\">40<\/span>, <span style=\"color: #79c0ff\">2.0<\/span>, <span style=\"color: #ff7b72\">True<\/span>, <span style=\"color: #79c0ff\">0.9<\/span>, duration, <span style=\"color: #ff7b72\">True<\/span>, <span style=\"color: #ff7b72\">True<\/span>\n    ]\n}\n\n<span style=\"color: #ff7b72\">try<\/span>:\n    resp = requests.post(<span style=\"color: #79c0ff\">f\"{OMNIVOICE_URL}\/gradio_api\/call\/_clone_fn\"<\/span>,\n        json=payload, timeout=<span style=\"color: #79c0ff\">15<\/span>)\n    event_id = resp.json().get(<span style=\"color: #a5d6ff\">\"event_id\"<\/span>)\n    <span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> event_id:\n        fallback_kokoro()\n<span style=\"color: #ff7b72\">except<\/span> Exception <span style=\"color: #ff7b72\">as<\/span> e:\n    fallback_kokoro()\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Step 5: Poll for result \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\nresult_url = <span style=\"color: #79c0ff\">f\"{OMNIVOICE_URL}\/gradio_api\/call\/_clone_fn\/{event_id}\"<\/span>\n<span style=\"color: #ff7b72\">for<\/span> attempt <span style=\"color: #ff7b72\">in<\/span> <span style=\"color: #79c0ff\">range<\/span>(<span style=\"color: #79c0ff\">30<\/span>):\n    <span style=\"color: #ff7b72\">try<\/span>:\n        time.sleep(<span style=\"color: #79c0ff\">3<\/span>)\n        r = requests.get(result_url, stream=<span style=\"color: #ff7b72\">True<\/span>, timeout=<span style=\"color: #79c0ff\">15<\/span>)\n        lines = [l.decode() <span style=\"color: #ff7b72\">for<\/span> l <span style=\"color: #ff7b72\">in<\/span> r.iter_lines() <span style=\"color: #ff7b72\">if<\/span> l]\n        <span style=\"color: #ff7b72\">for<\/span> line <span style=\"color: #ff7b72\">in<\/span> lines:\n            <span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> line.startswith(<span style=\"color: #a5d6ff\">\"data:\"<\/span>):\n                <span style=\"color: #ff7b72\">continue<\/span>\n            result_data = json.loads(line[<span style=\"color: #79c0ff\">5<\/span>:])\n            audio_info = result_data[<span style=\"color: #79c0ff\">0<\/span>]\n            <span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #ff7b72\">not<\/span> audio_info:\n                <span style=\"color: #ff7b72\">continue<\/span>\n            dl = requests.get(audio_info[<span style=\"color: #a5d6ff\">\"url\"<\/span>], timeout=<span style=\"color: #79c0ff\">30<\/span>)\n            tmp_wav = output_path + <span style=\"color: #a5d6ff\">\".wav\"<\/span>\n            <span style=\"color: #ff7b72\">with<\/span> <span style=\"color: #79c0ff\">open<\/span>(tmp_wav, <span style=\"color: #a5d6ff\">\"wb\"<\/span>) <span style=\"color: #ff7b72\">as<\/span> f:\n                f.write(dl.content)\n            fmt = output_path.split(<span style=\"color: #a5d6ff\">\".\"<\/span>)[-<span style=\"color: #79c0ff\">1<\/span>]\n            <span style=\"color: #ff7b72\">if<\/span> fmt == <span style=\"color: #a5d6ff\">\"ogg\"<\/span>:\n                to_ogg(tmp_wav, output_path)\n            <span style=\"color: #ff7b72\">elif<\/span> fmt == <span style=\"color: #a5d6ff\">\"mp3\"<\/span>:\n                to_mp3(tmp_wav, output_path)\n            <span style=\"color: #ff7b72\">else<\/span>:\n                os.rename(tmp_wav, output_path)\n            <span style=\"color: #ff7b72\">if<\/span> os.path.exists(tmp_wav):\n                os.remove(tmp_wav)\n            <span style=\"color: #79c0ff\">print<\/span>(<span style=\"color: #79c0ff\">f\"OK omnivoice: {output_path}\"<\/span>)\n            sys.exit(<span style=\"color: #79c0ff\">0<\/span>)\n        <span style=\"color: #ff7b72\">if<\/span> <span style=\"color: #a5d6ff\">\"error\"<\/span> <span style=\"color: #ff7b72\">in<\/span> content <span style=\"color: #ff7b72\">and<\/span> attempt &gt; <span style=\"color: #79c0ff\">2<\/span>:\n            fallback_kokoro()\n    <span style=\"color: #ff7b72\">except<\/span> Exception:\n        <span style=\"color: #ff7b72\">continue<\/span>\n\n<span style=\"color: #8b949e\"># \u2500\u2500 Timeout \u2192 fallback \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/span>\n<span style=\"color: #79c0ff\">print<\/span>(<span style=\"color: #a5d6ff\">\"OmniVoice timeout, falling back to Kokoro\"<\/span>)\nfallback_kokoro()<\/code><\/pre>\n\n<hr>\n\n<h2>License<\/h2>\n<p>MIT \u2014 use it, fork it, share it. No attribution required.<\/p>\n\n<div class=\"footer\">\n  <p>Built with \u2764\ufe0f \u2014 because your AI deserves a voice that feels like home \ud83c\udfe0\u2728<\/p>\n  <p style=\"margin-top:0.5rem;font-size:0.75rem;color:#484f58\"><a href=\"https:\/\/github.com\/k2-fsa\/omnivoice\" target=\"_blank\" rel=\"noopener\">OmniVoice<\/a> \u00b7 <a href=\"https:\/\/github.com\/remsky\/Kokoro-FastAPI\" target=\"_blank\" rel=\"noopener\">Kokoro<\/a> \u00b7 <a href=\"https:\/\/hermes-agent.nousresearch.com\" target=\"_blank\" rel=\"noopener\">Hermes Agent<\/a> \u00b7 FFmpeg<\/p>\n<\/div>\n\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Hermes Agent TTS Bridge: Give Your AI a Voice Clone Zero-shot voice cloning with Kokoro fallback \u2014 because conversations hit different when your AI sounds like someone you know \u2764\ufe0f A practical guide to bridging OmniVoice and Kokoro for zero-shot voice cloning with local fallback, Telegram-ready OGG output, and zero downtime. The Problem You&#8217;re chatting [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":841,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,8,10],"tags":[],"class_list":["post-838","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-homelab","category-ai-tools","category-hermes-agent"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/posts\/838","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/comments?post=838"}],"version-history":[{"count":1,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/posts\/838\/revisions"}],"predecessor-version":[{"id":840,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/posts\/838\/revisions\/840"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/media\/841"}],"wp:attachment":[{"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/media?parent=838"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/categories?post=838"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cyb3rjan.com\/index.php\/wp-json\/wp\/v2\/tags?post=838"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}