Skip to content

Multi-Source Input

KDeps supports multiple input sources simultaneously: HTTP API requests, audio hardware (microphones), video hardware (cameras), telephony devices, and chat bot platforms (Discord, Slack, Telegram, WhatsApp). Sources are configured in the settings.input block of your workflow.yaml.

Overview

SourceUse Case
apiHTTP API requests (default, REST/JSON)
audioMicrophone or line-in audio capture
videoCamera or V4L2 video capture
telephonyPhone call audio (local SIP device or cloud provider)
botChat bot platforms (Discord, Slack, Telegram, WhatsApp)

Workflows can combine sources:

Microphone only:

yaml
settings:
  input:
    sources: [audio]

Audio and video together:

yaml
settings:
  input:
    sources: [audio, video]

API requests and microphone:

yaml
settings:
  input:
    sources: [api, audio]

Phone/SIP only:

yaml
settings:
  input:
    sources: [telephony]

Source Configuration

API Source

The default. No additional config needed. The workflow responds to HTTP requests like any standard API.

yaml
settings:
  input:
    sources: [api]

Execution Type (Audio / Video / Telephony)

Hardware sources (audio, video, telephony) support two execution modes via executionType:

executionTypeDescription
stateless (default)Capture once, run workflow once, exit
pollingLoop continuously: after each capture-execute cycle, restart from capture. Blocks until Ctrl+C

Polling (voice assistant loop):

yaml
settings:
  input:
    sources: [audio]
    executionType: polling
    audio:
      device: hw:0,0

Stateless (single capture, default):

yaml
settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0

Audio Source

Captures audio from a hardware device using arecord (Linux/ALSA) or ffmpeg.

yaml
settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0            # ALSA: hw:<card>,<device>

Device identifiers by platform:

PlatformExample
Linux (ALSA)hw:0,0, default, plughw:1,0
macOSBuilt-in Microphone, default
WindowsMicrophone (Realtek Audio)

List available audio devices:

bash
# Linux
arecord -l

# macOS / Windows
ffmpeg -list_devices true -f avfoundation -i dummy   # macOS
ffmpeg -list_devices true -f dshow -i dummy          # Windows

Video Source

Captures video from a hardware camera using ffmpeg with the platform's native capture driver.

yaml
settings:
  input:
    sources: [video]
    video:
      device: /dev/video0       # V4L2 device path (Linux)

Device identifiers by platform:

PlatformExample
Linux (V4L2)/dev/video0, /dev/video1
macOS (AVFoundation)FaceTime HD Camera, 0
Windows (DirectShow)USB Video Device, 0

List available video devices:

bash
# Linux
v4l2-ctl --list-devices

# macOS
ffmpeg -list_devices true -f avfoundation -i dummy

# Windows
ffmpeg -list_devices true -f dshow -i dummy

Telephony Source

Captures audio from a phone or SIP device. Two modes are supported:

Local — a hardware telephony device (e.g. USB modem, ATA adapter):

yaml
settings:
  input:
    sources: [telephony]
    telephony:
      type: local
      device: /dev/ttyUSB0      # Serial device path

Online — a cloud telephony provider (media arrives via webhook):

yaml
settings:
  input:
    sources: [telephony]
    telephony:
      type: online
      provider: twilio          # Currently: twilio

When using an online provider, configure the provider's webhook to POST audio to your workflow's API endpoint.

Bot Source

Connects to one or more chat platforms and runs the workflow as a long-lived process (polling) or as a single-shot command (stateless). Each inbound message triggers one workflow execution; the reply is sent back to the platform automatically.

Execution Types

executionTypeDescription
polling (default)Long-running process: persistent connection per platform
statelessOne-shot: reads a JSON message from stdin, executes once, writes reply to stdout

Polling mode — runs as a daemon, reconnects automatically:

yaml
settings:
  input:
    sources: [bot]
    bot:
      executionType: polling
      telegram:
        botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"
        pollIntervalSeconds: 1

Stateless mode — run once from a shell script or cron job:

yaml
settings:
  input:
    sources: [bot]
    bot:
      executionType: stateless
bash
echo '{"message":"hello","chatId":"123","userId":"u1","platform":"telegram"}' \
  | kdeps run workflow.yaml

# Or use environment variables
KDEPS_BOT_MESSAGE="hello" KDEPS_BOT_PLATFORM="telegram" kdeps run workflow.yaml

Bot Reply Resource

Use the botReply resource type to send the reply back to the platform. It evaluates a text expression, then:

  • In polling mode: calls the platform's reply API for the originating chat ID, then the dispatcher loop resumes waiting for the next message.
  • In stateless mode: writes the text to stdout, then the process exits.
yaml
run:
  botReply:
    text: "{{ get('llm') }}"

The text field supports the same expressions as any other resource (get(), input(), string interpolation, etc.).

Accessing Message Fields

Inside any resource, use the input() expression function:

ExpressionValue
input('message')The user's message text
input('chatId')Platform chat/channel ID
input('userId')Sender's user ID
input('platform')Source platform name (e.g. telegram)

Platform Sub-Configs

Configure one or more platforms under bot:

Discord — connects via Discord Gateway WebSocket:

yaml
bot:
  executionType: polling
  discord:
    botToken: "{{ env('DISCORD_BOT_TOKEN') }}"
    guildId: "123456789"          # Optional: restrict to one server
FieldRequiredDescription
botTokenYesDiscord bot token (Bot ...)
guildIdNoRestrict to a specific guild (server)

Slack — connects via Socket Mode WebSocket:

yaml
bot:
  executionType: polling
  slack:
    botToken: "{{ env('SLACK_BOT_TOKEN') }}"       # xoxb-...
    appToken: "{{ env('SLACK_APP_TOKEN') }}"        # xapp-... (Socket Mode)
    mode: socket
FieldRequiredDescription
botTokenYesBot OAuth token (xoxb-...)
appTokenNoApp-level token for Socket Mode (xapp-...)
signingSecretNoSigning secret for request verification
modeNoConnection mode: socket (default)

Telegram — long-polling via getUpdates:

yaml
bot:
  executionType: polling
  telegram:
    botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"
    pollIntervalSeconds: 1        # Default: 1
FieldRequiredDescription
botTokenYesBot token from @BotFather
pollIntervalSecondsNoSeconds between polls (default: 1)

WhatsApp — embedded webhook HTTP server + WhatsApp Cloud API:

yaml
bot:
  executionType: polling
  whatsApp:
    phoneNumberId: "{{ env('WA_PHONE_NUMBER_ID') }}"
    accessToken: "{{ env('WA_ACCESS_TOKEN') }}"
    webhookSecret: "{{ env('WA_WEBHOOK_SECRET') }}"
    webhookPort: 16396            # Default: 16396
FieldRequiredDescription
phoneNumberIdYesWhatsApp Cloud API phone number ID
accessTokenYesMeta access token
webhookSecretNoWebhook verification token
webhookPortNoLocal port for webhook server (default: 16396)

WhatsApp note: Meta's Cloud API uses webhooks (not polling). You must expose webhookPort via a reverse proxy or HTTPS tunnel (ngrok, cloudflared) and set the webhook URL in the Meta app dashboard.

Multiple Platforms Simultaneously

Run on Discord + Telegram at the same time:

yaml
settings:
  input:
    sources: [bot]
    bot:
      executionType: polling
      discord:
        botToken: "{{ env('DISCORD_BOT_TOKEN') }}"
      telegram:
        botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"

Activation (Wake Phrase Detection)

Activation listens continuously for a wake phrase before triggering the main workflow. This is ideal for voice assistants and hands-free operation on edge devices.

yaml
settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0
    activation:
      phrase: "hey kdeps"       # Required: the phrase to listen for
      mode: offline             # online | offline
      sensitivity: 0.9          # 0.0–1.0  (1.0 = exact match only)
      chunkSeconds: 3           # Duration of each audio probe (seconds)
      offline:
        engine: faster-whisper
        model: small

How Activation Works

  1. The runtime captures chunkSeconds of audio in a loop.
  2. Each chunk is transcribed using the configured engine.
  3. If the transcript matches the wake phrase (within sensitivity threshold), the main workflow runs.
  4. After the workflow completes, the loop resumes.

Sensitivity

sensitivity controls fuzzy matching: 1.0 requires an exact phrase match, lower values allow approximate matches.

ValueBehavior
1.0Exact match only (default)
0.9~90% similarity required
0.5Broader matching, more false positives

Online Activation

Use a cloud STT provider for the activation loop:

yaml
activation:
  phrase: "hey kdeps"
  mode: online
  sensitivity: 0.95
  online:
    provider: deepgram
    apiKey: dg-...

Supported online providers: openai-whisper, google-stt, aws-transcribe, deepgram, assemblyai

Offline Activation

Run entirely on-device with no cloud calls:

yaml
activation:
  phrase: "hey kdeps"
  mode: offline
  sensitivity: 0.9
  offline:
    engine: faster-whisper     # whisper | faster-whisper | vosk | whisper-cpp
    model: small               # tiny, base, small, medium, large

Transcription (Speech-to-Text)

After audio capture (and optional activation), the transcriber converts the media signal into text that your workflow resources can use.

yaml
settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0
    transcriber:
      mode: offline             # online | offline
      output: text              # text | media
      language: en-US           # Optional BCP-47 language code
      offline:
        engine: faster-whisper
        model: small

Output Modes

outputDescription
textTranscribed string (default)
mediaRaw media file path (skips transcription)

Accessing Transcription Results

In any resource that runs after transcription:

yaml
run:
  chat:
    prompt: "{{ inputTranscript }}"    # expression function

Equivalent accessors:

  • inputTranscript — expression function
  • inputMedia — path to the raw media file
  • get("inputTranscript") — unified API

Online Transcription Providers

Providerprovider value
OpenAI Whisper APIopenai-whisper
Google Cloud STTgoogle-stt
AWS Transcribeaws-transcribe
Deepgramdeepgram
AssemblyAIassemblyai
yaml
transcriber:
  mode: online
  output: text
  language: en-US
  online:
    provider: deepgram
    apiKey: dg-...

Offline Transcription Engines

All engines run locally — no network calls, no data leaving the device.

Engineengine valueNotes
OpenAI WhisperwhisperRequires Python + openai-whisper
Faster Whisperfaster-whisperCTranslate2 backend, faster + lower RAM
VoskvoskLightweight, great for embedded devices
Whisper.cppwhisper-cppC++ port, runs on CPU without Python
yaml
transcriber:
  mode: offline
  output: text
  offline:
    engine: faster-whisper
    model: small              # tiny | base | small | medium | large

Combined Examples

Offline Voice Assistant (Raspberry Pi / Jetson)

Fully offline voice assistant — no cloud required. Uses executionType: polling so after each request the workflow restarts and listens again:

yaml
settings:
  input:
    sources: [audio]
    executionType: polling
    audio:
      device: hw:0,0
    activation:
      phrase: "hey kdeps"
      mode: offline
      sensitivity: 0.9
      offline:
        engine: faster-whisper
        model: tiny             # Use tiny model for fast response on edge hardware
    transcriber:
      mode: offline
      output: text
      offline:
        engine: faster-whisper
        model: small

Resource that processes the spoken input:

yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
  actionId: voiceChat
run:
  chat:
    model: llama3.2:1b
    prompt: "{{ inputTranscript }}"
  tts:
    text: "{{ get('voiceChat') }}"
    mode: offline
    offline:
      engine: piper
      model: en_US-lessac-medium

Video Surveillance + AI Analysis

yaml
settings:
  input:
    sources: [video]
    video:
      device: /dev/video0
    transcriber:
      mode: offline
      output: media             # Keep raw video, no transcription
      offline:
        engine: faster-whisper
        model: base

Resource that analyzes video frames:

yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
  actionId: analyzeFrame
run:
  chat:
    model: llama3.2-vision
    prompt: "Describe what you see in this video frame."
    images:
      - "{{ inputMedia }}"

Telephony Call Handler

yaml
settings:
  input:
    sources: [telephony]
    telephony:
      type: online
      provider: twilio
    transcriber:
      mode: online
      output: text
      online:
        provider: deepgram
        apiKey: dg-...

Multi-Source: API + Audio

Accept both HTTP requests and microphone input in the same workflow:

yaml
settings:
  input:
    sources: [api, audio]
    audio:
      device: hw:0,0
    transcriber:
      mode: offline
      output: text
      offline:
        engine: faster-whisper
        model: small

Edge Device Notes

KDeps is designed to run on resource-constrained hardware. Recommendations for edge deployments:

DeviceRecommended Config
Raspberry Pi 4faster-whisper with tiny or base model, espeak TTS
NVIDIA Jetson Nanofaster-whisper with small model, piper TTS
x86 mini-PC (no GPU)whisper-cpp with base model
Online-only edgeUse deepgram or openai-whisper for STT

For fully offline/air-gapped deployments, set offlineMode: true in agentSettings and use only offline engines:

yaml
settings:
  agentSettings:
    offlineMode: true
    models:
      - llama3.2:1b
  input:
    sources: [audio]
    audio:
      device: hw:0,0
    transcriber:
      mode: offline
      offline:
        engine: faster-whisper
        model: small

See Also

Released under the Apache 2.0 License.