TTS (Text-to-Speech) Resource
The TTS resource synthesizes speech from text, producing an audio file that can be accessed by downstream resources via the ttsOutput() expression function. It supports both cloud (online) and local (offline) synthesis engines and can be used as a primary resource or as an inline resource inside before / after blocks.
Basic Usage
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: speakResponse
name: Speak Response
run:
tts:
text: "Hello, welcome to KDeps!"
mode: offline
offline:
engine: espeakConfiguration Options
| Option | Type | Description |
|---|---|---|
text | string | Required. Text to synthesize. Supports expressions. |
mode | string | Required. online or offline. |
language | string | Optional BCP-47 language code (e.g. en-US). |
voice | string | Voice identifier (provider/engine-specific). |
speed | float | Speech rate multiplier. 1.0 is normal speed. |
outputFormat | string | Audio format: mp3 (default), wav, or ogg. |
outputFile | string | Explicit output path. If omitted, a path under /tmp/kdeps-tts/ is auto-generated. |
online | object | Cloud provider config (required when mode: online). |
offline | object | Local engine config (required when mode: offline). |
Online Provider Configuration
tts:
mode: online
online:
provider: openai-tts # Required: provider name
apiKey: sk-... # API authentication key
region: us-east-1 # Region (AWS Polly / Azure only)
subscriptionKey: ... # Azure Cognitive Services subscription key| Field | Description |
|---|---|
provider | One of: openai-tts, google-tts, elevenlabs, aws-polly, azure-tts. |
apiKey | Authentication credential for the provider. |
region | AWS region or Azure region (e.g. eastus). |
subscriptionKey | Azure-only: Cognitive Services subscription key. |
Offline Engine Configuration
tts:
mode: offline
offline:
engine: piper # Required: engine name
model: en_US-lessac-medium # Model name or path (piper / coqui-tts)| Field | Description |
|---|---|
engine | One of: piper, espeak, festival, coqui-tts. |
model | Model name or file path (used by piper and coqui-tts). |
Online Providers
OpenAI TTS
Calls the OpenAI /v1/audio/speech endpoint.
tts:
text: "Hello from OpenAI TTS"
mode: online
voice: alloy # alloy | echo | fable | onyx | nova | shimmer
outputFormat: mp3
online:
provider: openai-tts
apiKey: sk-...Google Cloud Text-to-Speech
Calls the Google Cloud TTS REST API.
tts:
text: "Hello from Google Cloud"
mode: online
language: en-US
voice: en-US-Standard-A
speed: 1.0
online:
provider: google-tts
apiKey: AIza...ElevenLabs
Calls the ElevenLabs /v1/text-to-speech endpoint.
tts:
text: "Hello from ElevenLabs"
mode: online
voice: "21m00Tcm4TlvDq8ikWAM" # ElevenLabs voice ID
online:
provider: elevenlabs
apiKey: xi-...AWS Polly
Note: AWS Polly requires SigV4 request signing. Use the
execresource to call the AWS CLI instead:yamlrun: exec: command: "aws polly synthesize-speech --output-format mp3 --voice-id Joanna --text 'Hello' /tmp/speech.mp3"
Azure Cognitive Services TTS
Calls the Azure TTS REST API.
tts:
text: "Hello from Azure"
mode: online
language: en-US
voice: en-US-JennyNeural
online:
provider: azure-tts
region: eastus
subscriptionKey: ...Offline Engines
Offline engines run locally without any network calls. They must be installed on the host system.
Piper
High-quality neural TTS. GitHub: rhasspy/piper
# Install
pip install piper-tts
# Or download binary from GitHub releasestts:
text: "Hello from Piper"
mode: offline
offline:
engine: piper
model: en_US-lessac-medium # Download from https://huggingface.co/rhasspy/piper-voiceseSpeak-NG
Fast, lightweight speech synthesizer. eSpeak NG
# Install
apt install espeak-ng # Debian/Ubuntu
brew install espeak # macOStts:
text: "Hello from eSpeak"
mode: offline
voice: en # eSpeak voice identifier
speed: 1.2 # Words per minute multiplier
offline:
engine: espeakFestival
Classic speech synthesis system. Festival Speech Synthesis System
# Install
apt install festival # Debian/Ubuntu
brew install festival # macOStts:
text: "Hello from Festival"
mode: offline
offline:
engine: festivalCoqui TTS
Open-source deep-learning TTS. Coqui TTS
# Install
pip install TTStts:
text: "Hello from Coqui"
mode: offline
offline:
engine: coqui-tts
model: tts_models/en/ljspeech/tacotron2-DDCAccessing the Output File
After the TTS resource runs, the path to the synthesized audio file is available via:
ttsOutput()— expression function in any resource evaluated after TTS runsget("ttsOutput")— unifiedget()helperinput("ttsOutput")orinput("tts")—input()helper
resources:
- metadata:
actionId: speakThenRespond
run:
tts:
text: "Here is your answer"
mode: offline
offline:
engine: espeak
- metadata:
actionId: sendAudioFile
run:
apiResponse:
success: true
response:
audioPath: "{{ttsOutput()}}"TTS as an Inline Resource
TTS can run before or after the main resource action using the before / after inline blocks:
run:
before:
- tts:
text: "Processing your request, please wait…"
mode: offline
offline:
engine: espeak
chat:
model: llama3
prompt: "{{input()}}"
after:
- tts:
text: "Processing complete."
mode: offline
offline:
engine: espeakUsing Expressions in text
The text field supports KDeps expressions:
tts:
text: "Hello {{get(\"name\")}}, your score is {{get(\"score\")}} points."
mode: offline
offline:
engine: espeakExample: Voice Assistant Response
Full example combining input transcription with TTS output:
# workflow.yaml
settings:
input:
source: audio
audio:
device: hw:0,0
activation:
phrase: "hey kdeps"
mode: offline
offline:
engine: faster-whisper
model: small
transcriber:
mode: offline
output: text
offline:
engine: faster-whisper
model: small
# resources/respond.yaml
run:
chat:
model: llama3
prompt: "{{inputTranscript()}}"
# resources/speak.yaml
run:
tts:
text: "{{get(\"respond\")}}"
mode: offline
offline:
engine: piper
model: en_US-lessac-medium