Multi-Source Input
KDeps supports multiple input sources simultaneously: HTTP API requests, audio hardware (microphones), video hardware (cameras), telephony devices, and chat bot platforms (Discord, Slack, Telegram, WhatsApp). Sources are configured in the settings.input block of your workflow.yaml.
Overview
| Source | Use Case |
|---|---|
api | HTTP API requests (default, REST/JSON) |
audio | Microphone or line-in audio capture |
video | Camera or V4L2 video capture |
telephony | Phone call audio (local SIP device or cloud provider) |
bot | Chat bot platforms (Discord, Slack, Telegram, WhatsApp) |
Workflows can combine sources:
Microphone only:
settings:
input:
sources: [audio]Audio and video together:
settings:
input:
sources: [audio, video]API requests and microphone:
settings:
input:
sources: [api, audio]Phone/SIP only:
settings:
input:
sources: [telephony]Source Configuration
API Source
The default. No additional config needed. The workflow responds to HTTP requests like any standard API.
settings:
input:
sources: [api]Execution Type (Audio / Video / Telephony)
Hardware sources (audio, video, telephony) support two execution modes via executionType:
executionType | Description |
|---|---|
stateless (default) | Capture once, run workflow once, exit |
polling | Loop continuously: after each capture-execute cycle, restart from capture. Blocks until Ctrl+C |
Polling (voice assistant loop):
settings:
input:
sources: [audio]
executionType: polling
audio:
device: hw:0,0Stateless (single capture, default):
settings:
input:
sources: [audio]
audio:
device: hw:0,0Audio Source
Captures audio from a hardware device using arecord (Linux/ALSA) or ffmpeg.
settings:
input:
sources: [audio]
audio:
device: hw:0,0 # ALSA: hw:<card>,<device>Device identifiers by platform:
| Platform | Example |
|---|---|
| Linux (ALSA) | hw:0,0, default, plughw:1,0 |
| macOS | Built-in Microphone, default |
| Windows | Microphone (Realtek Audio) |
List available audio devices:
# Linux
arecord -l
# macOS / Windows
ffmpeg -list_devices true -f avfoundation -i dummy # macOS
ffmpeg -list_devices true -f dshow -i dummy # WindowsVideo Source
Captures video from a hardware camera using ffmpeg with the platform's native capture driver.
settings:
input:
sources: [video]
video:
device: /dev/video0 # V4L2 device path (Linux)Device identifiers by platform:
| Platform | Example |
|---|---|
| Linux (V4L2) | /dev/video0, /dev/video1 |
| macOS (AVFoundation) | FaceTime HD Camera, 0 |
| Windows (DirectShow) | USB Video Device, 0 |
List available video devices:
# Linux
v4l2-ctl --list-devices
# macOS
ffmpeg -list_devices true -f avfoundation -i dummy
# Windows
ffmpeg -list_devices true -f dshow -i dummyTelephony Source
Captures audio from a phone or SIP device. Two modes are supported:
Local — a hardware telephony device (e.g. USB modem, ATA adapter):
settings:
input:
sources: [telephony]
telephony:
type: local
device: /dev/ttyUSB0 # Serial device pathOnline — a cloud telephony provider (media arrives via webhook):
settings:
input:
sources: [telephony]
telephony:
type: online
provider: twilio # Currently: twilioWhen using an online provider, configure the provider's webhook to POST audio to your workflow's API endpoint.
Bot Source
Connects to one or more chat platforms and runs the workflow as a long-lived process (polling) or as a single-shot command (stateless). Each inbound message triggers one workflow execution; the reply is sent back to the platform automatically.
Execution Types
executionType | Description |
|---|---|
polling (default) | Long-running process: persistent connection per platform |
stateless | One-shot: reads a JSON message from stdin, executes once, writes reply to stdout |
Polling mode — runs as a daemon, reconnects automatically:
settings:
input:
sources: [bot]
bot:
executionType: polling
telegram:
botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"
pollIntervalSeconds: 1Stateless mode — run once from a shell script or cron job:
settings:
input:
sources: [bot]
bot:
executionType: statelessecho '{"message":"hello","chatId":"123","userId":"u1","platform":"telegram"}' \
| kdeps run workflow.yaml
# Or use environment variables
KDEPS_BOT_MESSAGE="hello" KDEPS_BOT_PLATFORM="telegram" kdeps run workflow.yamlBot Reply Resource
Use the botReply resource type to send the reply back to the platform. It evaluates a text expression, then:
- In polling mode: calls the platform's reply API for the originating chat ID, then the dispatcher loop resumes waiting for the next message.
- In stateless mode: writes the text to stdout, then the process exits.
run:
botReply:
text: "{{ get('llm') }}"The text field supports the same expressions as any other resource (get(), input(), string interpolation, etc.).
Accessing Message Fields
Inside any resource, use the input() expression function:
| Expression | Value |
|---|---|
input('message') | The user's message text |
input('chatId') | Platform chat/channel ID |
input('userId') | Sender's user ID |
input('platform') | Source platform name (e.g. telegram) |
Platform Sub-Configs
Configure one or more platforms under bot:
Discord — connects via Discord Gateway WebSocket:
bot:
executionType: polling
discord:
botToken: "{{ env('DISCORD_BOT_TOKEN') }}"
guildId: "123456789" # Optional: restrict to one server| Field | Required | Description |
|---|---|---|
botToken | Yes | Discord bot token (Bot ...) |
guildId | No | Restrict to a specific guild (server) |
Slack — connects via Socket Mode WebSocket:
bot:
executionType: polling
slack:
botToken: "{{ env('SLACK_BOT_TOKEN') }}" # xoxb-...
appToken: "{{ env('SLACK_APP_TOKEN') }}" # xapp-... (Socket Mode)
mode: socket| Field | Required | Description |
|---|---|---|
botToken | Yes | Bot OAuth token (xoxb-...) |
appToken | No | App-level token for Socket Mode (xapp-...) |
signingSecret | No | Signing secret for request verification |
mode | No | Connection mode: socket (default) |
Telegram — long-polling via getUpdates:
bot:
executionType: polling
telegram:
botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"
pollIntervalSeconds: 1 # Default: 1| Field | Required | Description |
|---|---|---|
botToken | Yes | Bot token from @BotFather |
pollIntervalSeconds | No | Seconds between polls (default: 1) |
WhatsApp — embedded webhook HTTP server + WhatsApp Cloud API:
bot:
executionType: polling
whatsApp:
phoneNumberId: "{{ env('WA_PHONE_NUMBER_ID') }}"
accessToken: "{{ env('WA_ACCESS_TOKEN') }}"
webhookSecret: "{{ env('WA_WEBHOOK_SECRET') }}"
webhookPort: 16396 # Default: 16396| Field | Required | Description |
|---|---|---|
phoneNumberId | Yes | WhatsApp Cloud API phone number ID |
accessToken | Yes | Meta access token |
webhookSecret | No | Webhook verification token |
webhookPort | No | Local port for webhook server (default: 16396) |
WhatsApp note: Meta's Cloud API uses webhooks (not polling). You must expose
webhookPortvia a reverse proxy or HTTPS tunnel (ngrok, cloudflared) and set the webhook URL in the Meta app dashboard.
Multiple Platforms Simultaneously
Run on Discord + Telegram at the same time:
settings:
input:
sources: [bot]
bot:
executionType: polling
discord:
botToken: "{{ env('DISCORD_BOT_TOKEN') }}"
telegram:
botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"Activation (Wake Phrase Detection)
Activation listens continuously for a wake phrase before triggering the main workflow. This is ideal for voice assistants and hands-free operation on edge devices.
settings:
input:
sources: [audio]
audio:
device: hw:0,0
activation:
phrase: "hey kdeps" # Required: the phrase to listen for
mode: offline # online | offline
sensitivity: 0.9 # 0.0–1.0 (1.0 = exact match only)
chunkSeconds: 3 # Duration of each audio probe (seconds)
offline:
engine: faster-whisper
model: smallHow Activation Works
- The runtime captures
chunkSecondsof audio in a loop. - Each chunk is transcribed using the configured engine.
- If the transcript matches the wake phrase (within
sensitivitythreshold), the main workflow runs. - After the workflow completes, the loop resumes.
Sensitivity
sensitivity controls fuzzy matching: 1.0 requires an exact phrase match, lower values allow approximate matches.
| Value | Behavior |
|---|---|
1.0 | Exact match only (default) |
0.9 | ~90% similarity required |
0.5 | Broader matching, more false positives |
Online Activation
Use a cloud STT provider for the activation loop:
activation:
phrase: "hey kdeps"
mode: online
sensitivity: 0.95
online:
provider: deepgram
apiKey: dg-...Supported online providers: openai-whisper, google-stt, aws-transcribe, deepgram, assemblyai
Offline Activation
Run entirely on-device with no cloud calls:
activation:
phrase: "hey kdeps"
mode: offline
sensitivity: 0.9
offline:
engine: faster-whisper # whisper | faster-whisper | vosk | whisper-cpp
model: small # tiny, base, small, medium, largeTranscription (Speech-to-Text)
After audio capture (and optional activation), the transcriber converts the media signal into text that your workflow resources can use.
settings:
input:
sources: [audio]
audio:
device: hw:0,0
transcriber:
mode: offline # online | offline
output: text # text | media
language: en-US # Optional BCP-47 language code
offline:
engine: faster-whisper
model: smallOutput Modes
output | Description |
|---|---|
text | Transcribed string (default) |
media | Raw media file path (skips transcription) |
Accessing Transcription Results
In any resource that runs after transcription:
run:
chat:
prompt: "{{ inputTranscript }}" # expression functionEquivalent accessors:
inputTranscript— expression functioninputMedia— path to the raw media fileget("inputTranscript")— unified API
Online Transcription Providers
| Provider | provider value |
|---|---|
| OpenAI Whisper API | openai-whisper |
| Google Cloud STT | google-stt |
| AWS Transcribe | aws-transcribe |
| Deepgram | deepgram |
| AssemblyAI | assemblyai |
transcriber:
mode: online
output: text
language: en-US
online:
provider: deepgram
apiKey: dg-...Offline Transcription Engines
All engines run locally — no network calls, no data leaving the device.
| Engine | engine value | Notes |
|---|---|---|
| OpenAI Whisper | whisper | Requires Python + openai-whisper |
| Faster Whisper | faster-whisper | CTranslate2 backend, faster + lower RAM |
| Vosk | vosk | Lightweight, great for embedded devices |
| Whisper.cpp | whisper-cpp | C++ port, runs on CPU without Python |
transcriber:
mode: offline
output: text
offline:
engine: faster-whisper
model: small # tiny | base | small | medium | largeCombined Examples
Offline Voice Assistant (Raspberry Pi / Jetson)
Fully offline voice assistant — no cloud required. Uses executionType: polling so after each request the workflow restarts and listens again:
settings:
input:
sources: [audio]
executionType: polling
audio:
device: hw:0,0
activation:
phrase: "hey kdeps"
mode: offline
sensitivity: 0.9
offline:
engine: faster-whisper
model: tiny # Use tiny model for fast response on edge hardware
transcriber:
mode: offline
output: text
offline:
engine: faster-whisper
model: smallResource that processes the spoken input:
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: voiceChat
run:
chat:
model: llama3.2:1b
prompt: "{{ inputTranscript }}"
tts:
text: "{{ get('voiceChat') }}"
mode: offline
offline:
engine: piper
model: en_US-lessac-mediumVideo Surveillance + AI Analysis
settings:
input:
sources: [video]
video:
device: /dev/video0
transcriber:
mode: offline
output: media # Keep raw video, no transcription
offline:
engine: faster-whisper
model: baseResource that analyzes video frames:
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: analyzeFrame
run:
chat:
model: llama3.2-vision
prompt: "Describe what you see in this video frame."
images:
- "{{ inputMedia }}"Telephony Call Handler
settings:
input:
sources: [telephony]
telephony:
type: online
provider: twilio
transcriber:
mode: online
output: text
online:
provider: deepgram
apiKey: dg-...Multi-Source: API + Audio
Accept both HTTP requests and microphone input in the same workflow:
settings:
input:
sources: [api, audio]
audio:
device: hw:0,0
transcriber:
mode: offline
output: text
offline:
engine: faster-whisper
model: smallEdge Device Notes
KDeps is designed to run on resource-constrained hardware. Recommendations for edge deployments:
| Device | Recommended Config |
|---|---|
| Raspberry Pi 4 | faster-whisper with tiny or base model, espeak TTS |
| NVIDIA Jetson Nano | faster-whisper with small model, piper TTS |
| x86 mini-PC (no GPU) | whisper-cpp with base model |
| Online-only edge | Use deepgram or openai-whisper for STT |
For fully offline/air-gapped deployments, set offlineMode: true in agentSettings and use only offline engines:
settings:
agentSettings:
offlineMode: true
models:
- llama3.2:1b
input:
sources: [audio]
audio:
device: hw:0,0
transcriber:
mode: offline
offline:
engine: faster-whisper
model: smallSee Also
- Workflow Configuration — Full
settings.inputreference - TTS Resource — Speech output
- LLM Resource — Language model integration
- Docker Deployment — Package for edge deployment
- Bot Tutorial — Step-by-step Telegram bot with LLM replies