LLM Backends Reference
KDeps supports multiple LLM backends, both local and cloud-based. This reference covers all available backends and their configuration.
Backend Overview
Local Backend
| Backend | Name | Default URL | Description |
|---|---|---|---|
| Ollama | ollama | http://localhost:11434 | Local model serving (default) |
Cloud Backends
| Backend | Name | API URL | Env Variable |
|---|---|---|---|
| OpenAI | openai | https://api.openai.com | OPENAI_API_KEY |
| Anthropic | anthropic | https://api.anthropic.com | ANTHROPIC_API_KEY |
google | https://generativelanguage.googleapis.com | GOOGLE_API_KEY | |
| Cohere | cohere | https://api.cohere.ai | COHERE_API_KEY |
| Mistral | mistral | https://api.mistral.ai | MISTRAL_API_KEY |
| Together | together | https://api.together.xyz | TOGETHER_API_KEY |
| Perplexity | perplexity | https://api.perplexity.ai | PERPLEXITY_API_KEY |
| Groq | groq | https://api.groq.com | GROQ_API_KEY |
| DeepSeek | deepseek | https://api.deepseek.com | DEEPSEEK_API_KEY |
Local Backend
Ollama (Default)
Ollama is the default backend for local model serving.
run:
chat:
backend: ollama
model: llama3.2:1b
prompt: "Hello, world!"Configuration:
# Custom Ollama URL
run:
chat:
backend: ollama
baseUrl: "http://custom-ollama:11434"
model: llama3.2:1b
prompt: "{{ get('q') }}"Workflow-level Ollama configuration:
settings:
agentSettings:
models:
- llama3.2:1b
- nomic-embed-text
ollamaUrl: "http://ollama:11434"
installOllama: true # Explicitly install Ollama in Docker imageDocker Build:
When building Docker images, Ollama is automatically installed if:
- A Chat resource uses the
ollamabackend (or no backend specified) - Models are configured in
agentSettings.models installOllama: trueis explicitly set
You can also disable Ollama installation by setting installOllama: false.
Cloud Backends
OpenAI
run:
chat:
backend: openai
model: gpt-4o
prompt: "{{ get('q') }}"With explicit API key:
run:
chat:
backend: openai
apiKey: "{{ get('OPENAI_API_KEY', 'env') }}"
model: gpt-4o
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
gpt-4o | Latest GPT-4 Omni |
gpt-4o-mini | Smaller, faster GPT-4 |
gpt-4-turbo | GPT-4 Turbo |
gpt-3.5-turbo | Fast, cost-effective |
Anthropic (Claude)
run:
chat:
backend: anthropic
model: claude-3-5-sonnet-20241022
prompt: "{{ get('q') }}"
contextLength: 4096Available models:
| Model | Description |
|---|---|
claude-3-5-sonnet-20241022 | Latest Claude 3.5 Sonnet |
claude-3-opus-20240229 | Most capable |
claude-3-sonnet-20240229 | Balanced |
claude-3-haiku-20240307 | Fast, efficient |
Google (Gemini)
run:
chat:
backend: google
model: gemini-1.5-pro
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
gemini-1.5-pro | Latest Gemini Pro |
gemini-1.5-flash | Fast inference |
gemini-pro | Standard Gemini |
Mistral
run:
chat:
backend: mistral
model: mistral-large-latest
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
mistral-large-latest | Most capable |
mistral-medium-latest | Balanced |
mistral-small-latest | Fast, efficient |
open-mistral-7b | Open-source 7B |
open-mixtral-8x7b | MoE model |
Together AI
Access to many open-source models.
run:
chat:
backend: together
model: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
prompt: "{{ get('q') }}"Popular models:
| Model | Description |
|---|---|
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | Llama 3.1 70B |
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | Llama 3.1 8B |
mistralai/Mixtral-8x7B-Instruct-v0.1 | Mixtral 8x7B |
Qwen/Qwen2-72B-Instruct | Qwen2 72B |
Groq
Ultra-fast inference with Groq hardware.
run:
chat:
backend: groq
model: llama-3.1-70b-versatile
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
llama-3.1-70b-versatile | Llama 3.1 70B |
llama-3.1-8b-instant | Llama 3.1 8B (fastest) |
mixtral-8x7b-32768 | Mixtral with 32K context |
gemma2-9b-it | Google Gemma 2 9B |
Perplexity
Search-augmented LLM responses.
run:
chat:
backend: perplexity
model: llama-3.1-sonar-large-128k-online
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
llama-3.1-sonar-large-128k-online | Large with web search |
llama-3.1-sonar-small-128k-online | Small with web search |
llama-3.1-sonar-large-128k-chat | Large chat only |
Cohere
run:
chat:
backend: cohere
model: command-r-plus
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
command-r-plus | Most capable |
command-r | Fast and capable |
command | Standard |
command-light | Fast, efficient |
DeepSeek
run:
chat:
backend: deepseek
model: deepseek-chat
prompt: "{{ get('q') }}"Available models:
| Model | Description |
|---|---|
deepseek-chat | General chat |
deepseek-coder | Code generation |
Backend Configuration
Common Options
All backends support these options:
run:
chat:
backend: openai # Backend name
baseUrl: "https://..." # Custom base URL (optional)
apiKey: "sk-..." # API key (optional, falls back to env)
model: gpt-4o # Model name
prompt: "{{ get('q') }}" # User prompt
# Optional settings
contextLength: 4096 # Max tokens
jsonResponse: true # Request JSON output
jsonResponseKeys: # Expected JSON keys
- answer
- confidenceCustom Base URL
Override the default API URL:
# Use Azure OpenAI
run:
chat:
backend: openai
baseUrl: "https://my-resource.openai.azure.com/openai/deployments/my-deployment"
apiKey: "{{ get('AZURE_OPENAI_KEY', 'env') }}"
model: gpt-4o
prompt: "{{ get('q') }}"
# Use OpenAI-compatible proxy
run:
chat:
backend: openai
baseUrl: "https://my-proxy.example.com"
model: gpt-4o
prompt: "{{ get('q') }}"API Key Configuration
API keys can be provided in multiple ways:
1. Environment variable (recommended):
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."2. In resource configuration:
run:
chat:
backend: openai
apiKey: "{{ get('OPENAI_API_KEY', 'env') }}"
model: gpt-4o3. From session/request:
run:
chat:
backend: openai
apiKey: "{{ get('apiKey', 'session') }}"
model: gpt-4oMixing Backends
Use different backends in the same workflow:
# resources/fast-summary.yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: fastSummary
run:
chat:
backend: groq
model: llama-3.1-8b-instant
prompt: "Summarize: {{ get('q') }}"
---
# resources/deep-analysis.yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: deepAnalysis
requires:
- fastSummary
run:
chat:
backend: anthropic
model: claude-3-5-sonnet-20241022
prompt: |
Based on this summary: {{ get('fastSummary') }}
Provide detailed analysis.Feature Support
| Feature | Ollama | OpenAI | Anthropic | Mistral | Groq | |
|---|---|---|---|---|---|---|
| JSON Response | Yes | Yes | Partial | Yes | Yes | Yes |
| Tools/Functions | Yes | Yes | No | Yes | Yes | Yes |
| Vision | Yes* | Yes | Yes | Yes | Yes | Yes |
| Streaming | No** | No** | No** | No** | No** | No** |
*Requires vision-capable model (e.g., llama3.2-vision) **KDeps uses non-streaming for reliability
Troubleshooting
Connection Issues
Local backend not responding:
# Verify the backend is running
run:
httpClient:
url: "http://localhost:11434/api/tags"
method: GETAPI key errors:
# Debug: check if API key is set
run:
expr:
- set('hasKey', get('OPENAI_API_KEY', 'env') != nil)
- set('keyLength', len(default(get('OPENAI_API_KEY', 'env'), '')))Model Not Found
Ensure the model is available:
Ollama:
ollama list # See available models
ollama pull llama3.2:1b # Download modelCloud backends: Check the provider's documentation for current model names.
Rate Limiting
Handle rate limits with retry configuration:
run:
chat:
backend: openai
model: gpt-4o
prompt: "{{ get('q') }}"
retry:
maxAttempts: 3
initialDelay: "1s"
maxDelay: "30s"
backoffMultiplier: 2See Also
- LLM Resource - Complete LLM resource documentation
- Tools - LLM function calling
- Docker Deployment - Deploying with local models