Multi-Source Input

KDeps supports multiple input sources simultaneously: HTTP API requests, audio hardware (microphones), video hardware (cameras), telephony devices, chat bot platforms (Discord, Slack, Telegram, WhatsApp), and file input from stdin. Sources are configured in the settings.input block of your workflow.yaml.

Overview

Source	Use Case
`api`	HTTP API requests (default, REST/JSON)
`audio`	Microphone or line-in audio capture
`video`	Camera or V4L2 video capture
`telephony`	Phone call audio (local SIP device or cloud provider)
`bot`	Chat bot platforms (Discord, Slack, Telegram, WhatsApp)
`file`	File content from stdin, env var, or configured path (single-shot)
`component`	Invokable only via `run.component` from a parent workflow (no listener started)
`llm`	Interactive LLM REPL (stdin/stdout) or API server for chat; use with `--interactive` flag

Workflows can combine sources:

Microphone only:

yaml

settings:
  input:
    sources: [audio]

Audio and video together:

yaml

settings:
  input:
    sources: [audio, video]

API requests and microphone:

yaml

settings:
  input:
    sources: [api, audio]

Phone/SIP only:

yaml

settings:
  input:
    sources: [telephony]

Source Configuration

API Source

The default. No additional config needed. The workflow responds to HTTP requests like any standard API.

yaml

settings:
  input:
    sources: [api]

Execution Type (Audio / Video / Telephony)

Hardware sources (audio, video, telephony) support two execution modes via executionType:

`executionType`	Description
`stateless` (default)	Capture once, run workflow once, exit
`polling`	Loop continuously: after each capture-execute cycle, restart from capture. Blocks until Ctrl+C

Polling (voice assistant loop):

yaml

settings:
  input:
    sources: [audio]
    executionType: polling
    audio:
      device: hw:0,0

Stateless (single capture, default):

yaml

settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0

Audio Source

Captures audio from a hardware device using arecord (Linux/ALSA) or ffmpeg.

yaml

settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0            # ALSA: hw:<card>,<device>

Device identifiers by platform:

Platform	Example
Linux (ALSA)	`hw:0,0`, `default`, `plughw:1,0`
macOS	`Built-in Microphone`, `default`
Windows	`Microphone (Realtek Audio)`

List available audio devices:

bash

# Linux
arecord -l

# macOS / Windows
ffmpeg -list_devices true -f avfoundation -i dummy   # macOS
ffmpeg -list_devices true -f dshow -i dummy          # Windows

Video Source

Captures video from a hardware camera using ffmpeg with the platform's native capture driver.

yaml

settings:
  input:
    sources: [video]
    video:
      device: /dev/video0       # V4L2 device path (Linux)

Device identifiers by platform:

Platform	Example
Linux (V4L2)	`/dev/video0`, `/dev/video1`
macOS (AVFoundation)	`FaceTime HD Camera`, `0`
Windows (DirectShow)	`USB Video Device`, `0`

List available video devices:

bash

# Linux
v4l2-ctl --list-devices

# macOS
ffmpeg -list_devices true -f avfoundation -i dummy

# Windows
ffmpeg -list_devices true -f dshow -i dummy

Telephony Source

Captures audio from a phone or SIP device. Two modes are supported:

Local — a hardware telephony device (e.g. USB modem, ATA adapter):

yaml

settings:
  input:
    sources: [telephony]
    telephony:
      type: local
      device: /dev/ttyUSB0      # Serial device path

Online — a cloud telephony provider (media arrives via webhook):

yaml

settings:
  input:
    sources: [telephony]
    telephony:
      type: online
      provider: twilio          # Currently: twilio

When using an online provider, configure the provider's webhook to POST audio to your workflow's API endpoint.

Bot Source

Connects to one or more chat platforms and runs the workflow as a long-lived process (polling) or as a single-shot command (stateless). Each inbound message triggers one workflow execution; the reply is sent back to the platform automatically.

Execution Types

`executionType`	Description
`polling` (default)	Long-running process: persistent connection per platform
`stateless`	One-shot: reads a JSON message from stdin, executes once, writes reply to stdout

Polling mode — runs as a daemon, reconnects automatically:

yaml

settings:
  input:
    sources: [bot]
    bot:
      executionType: polling
      telegram:
        botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"
        pollIntervalSeconds: 1

Stateless mode — run once from a shell script or cron job:

yaml

settings:
  input:
    sources: [bot]
    bot:
      executionType: stateless

bash

echo '{"message":"hello","chatId":"123","userId":"u1","platform":"telegram"}' \
  | kdeps run workflow.yaml

# Or use environment variables
KDEPS_BOT_MESSAGE="hello" KDEPS_BOT_PLATFORM="telegram" kdeps run workflow.yaml

Bot Reply Resource

Use the botReply resource type to send the reply back to the platform. It evaluates a text expression, then:

In polling mode: calls the platform's reply API for the originating chat ID, then the dispatcher loop resumes waiting for the next message.
In stateless mode: writes the text to stdout, then the process exits.

yaml

run:
  botReply:
    text: "{{ get('llm') }}"

The text field supports the same expressions as any other resource (get(), input(), string interpolation, etc.).

Accessing Message Fields

Inside any resource, use the input() expression function:

Expression	Value
`input('message')`	The user's message text
`input('chatId')`	Platform chat/channel ID
`input('userId')`	Sender's user ID
`input('platform')`	Source platform name (e.g. `telegram`)

Platform Sub-Configs

Configure one or more platforms under bot:

Discord — connects via Discord Gateway WebSocket:

yaml

bot:
  executionType: polling
  discord:
    botToken: "{{ env('DISCORD_BOT_TOKEN') }}"
    guildId: "123456789"          # Optional: restrict to one server

Field	Required	Description
`botToken`	Yes	Discord bot token (`Bot ...`)
`guildId`	No	Restrict to a specific guild (server)

Slack — connects via Socket Mode WebSocket:

yaml

bot:
  executionType: polling
  slack:
    botToken: "{{ env('SLACK_BOT_TOKEN') }}"       # xoxb-...
    appToken: "{{ env('SLACK_APP_TOKEN') }}"        # xapp-... (Socket Mode)
    mode: socket

Field	Required	Description
`botToken`	Yes	Bot OAuth token (`xoxb-...`)
`appToken`	No	App-level token for Socket Mode (`xapp-...`)
`signingSecret`	No	Signing secret for request verification
`mode`	No	Connection mode: `socket` (default)

Telegram — long-polling via getUpdates:

yaml

bot:
  executionType: polling
  telegram:
    botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"
    pollIntervalSeconds: 1        # Default: 1

Field	Required	Description
`botToken`	Yes	Bot token from @BotFather
`pollIntervalSeconds`	No	Seconds between polls (default: 1)

WhatsApp — embedded webhook HTTP server + WhatsApp Cloud API:

yaml

bot:
  executionType: polling
  whatsApp:
    phoneNumberId: "{{ env('WA_PHONE_NUMBER_ID') }}"
    accessToken: "{{ env('WA_ACCESS_TOKEN') }}"
    webhookSecret: "{{ env('WA_WEBHOOK_SECRET') }}"
    webhookPort: 16396            # Default: 16396

Field	Required	Description
`phoneNumberId`	Yes	WhatsApp Cloud API phone number ID
`accessToken`	Yes	Meta access token
`webhookSecret`	No	Webhook verification token
`webhookPort`	No	Local port for webhook server (default: 16396)

WhatsApp note: Meta's Cloud API uses webhooks (not polling). You must expose webhookPort via a reverse proxy or HTTPS tunnel (ngrok, cloudflared) and set the webhook URL in the Meta app dashboard.

Multiple Platforms Simultaneously

Run on Discord + Telegram at the same time:

yaml

settings:
  input:
    sources: [bot]
    bot:
      executionType: polling
      discord:
        botToken: "{{ env('DISCORD_BOT_TOKEN') }}"
      telegram:
        botToken: "{{ env('TELEGRAM_BOT_TOKEN') }}"

File Source

The file source reads text content from a --file CLI argument, stdin, an environment variable, or a configured file path, executes the workflow once, and exits. It is ideal for CLI pipelines, batch processing, document analysis, and script automation.

Input Resolution

Content is resolved in the following priority order:

--file CLI argument — highest priority; overrides all other sources
stdin — piped text or JSON {"path":"…","content":"…"}
KDEPS_FILE_PATH environment variable — file path to read
input.file.path config field — default file path

If only a path is provided (no inline content), the file is read from disk automatically.

Configuration

yaml

settings:
  input:
    sources: [file]
    file:
      path: /optional/default/path.txt  # fallback when stdin and KDEPS_FILE_PATH are empty

Usage Examples

bash

# Pass the file path directly as a CLI argument (highest priority)
./kdeps run workflow.yaml --file /path/to/document.txt

# Pipe raw file content via stdin
cat document.txt | ./kdeps run workflow.yaml

# Pipe JSON with a file path — file is read from disk
echo '{"path":"/tmp/doc.txt"}' | ./kdeps run workflow.yaml

# Pipe JSON with inline content
echo '{"path":"/tmp/doc.txt","content":"hello world"}' | ./kdeps run workflow.yaml

# Use environment variable
KDEPS_FILE_PATH=/tmp/doc.txt ./kdeps run workflow.yaml

Accessing File Data in Resources

Inside any resource that runs after the file input:

Expression	Value
`input("content")` or `input("fileContent")`	The file's text content
`input("path")` or `input("filePath")`	The source file path (if known)
`get("inputFileContent")`	File content via `get()`
`get("inputFilePath")`	File path via `get()`

Example — Summarize a document:

yaml

run:
  chat:
    model: llama3.2:3b
    prompt: "Summarize the following document:\n\n{{ input('fileContent') }}"

Activation (Wake Phrase Detection)

Activation listens continuously for a wake phrase before triggering the main workflow. This is ideal for voice assistants and hands-free operation on edge devices.

yaml

settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0
    activation:
      phrase: "hey kdeps"       # Required: the phrase to listen for
      mode: offline             # online | offline
      sensitivity: 0.9          # 0.0–1.0  (1.0 = exact match only)
      chunkSeconds: 3           # Duration of each audio probe (seconds)
      offline:
        engine: faster-whisper
        model: small

How Activation Works

The runtime captures chunkSeconds of audio in a loop.
Each chunk is transcribed using the configured engine.
If the transcript matches the wake phrase (within sensitivity threshold), the main workflow runs.
After the workflow completes, the loop resumes.

Sensitivity

sensitivity controls fuzzy matching: 1.0 requires an exact phrase match, lower values allow approximate matches.

Value	Behavior
`1.0`	Exact match only (default)
`0.9`	~90% similarity required
`0.5`	Broader matching, more false positives

Online Activation

Use a cloud STT provider for the activation loop:

yaml

activation:
  phrase: "hey kdeps"
  mode: online
  sensitivity: 0.95
  online:
    provider: deepgram
    apiKey: dg-...

Supported online providers: openai-whisper, google-stt, aws-transcribe, deepgram, assemblyai

Offline Activation

Run entirely on-device with no cloud calls:

yaml

activation:
  phrase: "hey kdeps"
  mode: offline
  sensitivity: 0.9
  offline:
    engine: faster-whisper     # whisper | faster-whisper | vosk | whisper-cpp
    model: small               # tiny, base, small, medium, large

Transcription (Speech-to-Text)

After audio capture (and optional activation), the transcriber converts the media signal into text that your workflow resources can use.

yaml

settings:
  input:
    sources: [audio]
    audio:
      device: hw:0,0
    transcriber:
      mode: offline             # online | offline
      output: text              # text | media
      language: en-US           # Optional BCP-47 language code
      offline:
        engine: faster-whisper
        model: small

Output Modes

`output`	Description
`text`	Transcribed string (default)
`media`	Raw media file path (skips transcription)

Accessing Transcription Results

In any resource that runs after transcription:

yaml

run:
  chat:
    prompt: "{{ inputTranscript }}"    # expression function

Equivalent accessors:

inputTranscript — expression function
inputMedia — path to the raw media file
get("inputTranscript") — unified API

Online Transcription Providers

Provider	`provider` value
OpenAI Whisper API	`openai-whisper`
Google Cloud STT	`google-stt`
AWS Transcribe	`aws-transcribe`
Deepgram	`deepgram`
AssemblyAI	`assemblyai`

yaml

transcriber:
  mode: online
  output: text
  language: en-US
  online:
    provider: deepgram
    apiKey: dg-...

Offline Transcription Engines

All engines run locally — no network calls, no data leaving the device.

Engine	`engine` value	Notes
OpenAI Whisper	`whisper`	Requires Python + `openai-whisper`
Faster Whisper	`faster-whisper`	CTranslate2 backend, faster + lower RAM
Vosk	`vosk`	Lightweight, great for embedded devices
Whisper.cpp	`whisper-cpp`	C++ port, runs on CPU without Python

yaml

transcriber:
  mode: offline
  output: text
  offline:
    engine: faster-whisper
    model: small              # tiny | base | small | medium | large

Combined Examples

Offline Voice Assistant (Raspberry Pi / Jetson)

Fully offline voice assistant — no cloud required. Uses executionType: polling so after each request the workflow restarts and listens again:

yaml

settings:
  input:
    sources: [audio]
    executionType: polling
    audio:
      device: hw:0,0
    activation:
      phrase: "hey kdeps"
      mode: offline
      sensitivity: 0.9
      offline:
        engine: faster-whisper
        model: tiny             # Use tiny model for fast response on edge hardware
    transcriber:
      mode: offline
      output: text
      offline:
        engine: faster-whisper
        model: small

Resource that processes the spoken input:

yaml

apiVersion: kdeps.io/v1
kind: Resource
metadata:
  actionId: voiceChat
run:
  chat:
    model: llama3.2:1b
    prompt: "{{ inputTranscript }}"
  component:
    name: tts
    with:
      text: "{{ get('voiceChat') }}"

Video Surveillance + AI Analysis

yaml

settings:
  input:
    sources: [video]
    video:
      device: /dev/video0
    transcriber:
      mode: offline
      output: media             # Keep raw video, no transcription
      offline:
        engine: faster-whisper
        model: base

Resource that analyzes video frames:

yaml

apiVersion: kdeps.io/v1
kind: Resource
metadata:
  actionId: analyzeFrame
run:
  chat:
    model: llama3.2-vision
    prompt: "Describe what you see in this video frame."
    images:
      - "{{ inputMedia }}"

Telephony Call Handler

yaml

settings:
  input:
    sources: [telephony]
    telephony:
      type: online
      provider: twilio
    transcriber:
      mode: online
      output: text
      online:
        provider: deepgram
        apiKey: dg-...

Multi-Source: API + Audio

Accept both HTTP requests and microphone input in the same workflow:

yaml

settings:
  input:
    sources: [api, audio]
    audio:
      device: hw:0,0
    transcriber:
      mode: offline
      output: text
      offline:
        engine: faster-whisper
        model: small

Edge Device Notes

KDeps is designed to run on resource-constrained hardware. Recommendations for edge deployments:

Device	Recommended Config
Raspberry Pi 4	`faster-whisper` with `tiny` or `base` model, `espeak` TTS
NVIDIA Jetson Nano	`faster-whisper` with `small` model, `piper` TTS
x86 mini-PC (no GPU)	`whisper-cpp` with `base` model
Online-only edge	Use `deepgram` or `openai-whisper` for STT

For fully offline/air-gapped deployments, set offlineMode: true in agentSettings and use only offline engines:

yaml

settings:
  agentSettings:
    offlineMode: true
    models:
      - llama3.2:1b
  input:
    sources: [audio]
    audio:
      device: hw:0,0
    transcriber:
      mode: offline
      offline:
        engine: faster-whisper
        model: small

Component Source

The component source signals that a workflow is designed to be invoked exclusively via run.component from a parent workflow. No HTTP server, bot listener, file reader, or media pipeline is started when this source is declared - the workflow is driven entirely by component invocations.

yaml

settings:
  input:
    sources: [component]
    component:
      description: "Accepts a text query and returns a processed result"

The optional description field documents what the workflow expects and what it returns. It is surfaced by kdeps registry info and kdeps registry list.

When to Use

Use component source when you want to build a reusable workflow sub-module:

A shared preprocessing step called by multiple parent workflows
A validated transformation (e.g., JSON sanitizer, formatter) with documented inputs
A multi-step LLM pipeline that other workflows can call like a function

Inputs

Inputs come from the calling resource's with: block, exactly as they do for any component:

Parent workflow resource:

yaml

run:
  component:
    name: summarizer
    with:
      text: "{{ get('fetchDocs.body') }}"
      maxWords: 200

summarizer workflow (sources: [component]):

yaml

run:
  chat:
    model: llama3.2:3b
    prompt: |
      Summarize the following in at most {{ input('maxWords') }} words:
      {{ input('text') }}

Combining with API

A workflow can declare both component and api sources if it should respond to both HTTP requests and component invocations:

yaml

settings:
  input:
    sources: [component, api]

Multi-Source Input ​

Overview ​

Source Configuration ​

API Source ​

Execution Type (Audio / Video / Telephony) ​

Audio Source ​

Video Source ​

Telephony Source ​

Bot Source ​

Execution Types ​

Bot Reply Resource ​

Accessing Message Fields ​

Platform Sub-Configs ​

Multiple Platforms Simultaneously ​

File Source ​

Input Resolution ​

Configuration ​

Usage Examples ​

Accessing File Data in Resources ​

Activation (Wake Phrase Detection) ​

How Activation Works ​

Sensitivity ​

Online Activation ​

Offline Activation ​

Transcription (Speech-to-Text) ​

Output Modes ​

Accessing Transcription Results ​

Online Transcription Providers ​

Offline Transcription Engines ​

Combined Examples ​

Offline Voice Assistant (Raspberry Pi / Jetson) ​

Video Surveillance + AI Analysis ​

Telephony Call Handler ​

Multi-Source: API + Audio ​

Edge Device Notes ​

Component Source ​

When to Use ​

Inputs ​

Combining with API ​

See Also ​

Multi-Source Input

Overview

Source Configuration

API Source

Execution Type (Audio / Video / Telephony)

Audio Source

Video Source

Telephony Source

Bot Source

Execution Types

Bot Reply Resource

Accessing Message Fields

Platform Sub-Configs

Multiple Platforms Simultaneously

File Source

Input Resolution

Configuration

Usage Examples

Accessing File Data in Resources

Activation (Wake Phrase Detection)

How Activation Works

Sensitivity

Online Activation

Offline Activation

Transcription (Speech-to-Text)

Output Modes

Accessing Transcription Results

Online Transcription Providers

Offline Transcription Engines

Combined Examples

Offline Voice Assistant (Raspberry Pi / Jetson)

Video Surveillance + AI Analysis

Telephony Call Handler

Multi-Source: API + Audio

Edge Device Notes

Component Source

When to Use

Inputs

Combining with API

See Also