Skip to content

LLM Backends Reference

KDeps supports multiple LLM backends, both local and cloud-based. This reference covers all available backends and their configuration.

Backend Overview

Local Backend

BackendNameDefault URLDescription
Ollamaollamahttp://localhost:11434Local model serving (default)

Cloud Backends

BackendNameAPI URLEnv Variable
OpenAIopenaihttps://api.openai.comOPENAI_API_KEY
Anthropicanthropichttps://api.anthropic.comANTHROPIC_API_KEY
Googlegooglehttps://generativelanguage.googleapis.comGOOGLE_API_KEY
Coherecoherehttps://api.cohere.aiCOHERE_API_KEY
Mistralmistralhttps://api.mistral.aiMISTRAL_API_KEY
Togethertogetherhttps://api.together.xyzTOGETHER_API_KEY
Perplexityperplexityhttps://api.perplexity.aiPERPLEXITY_API_KEY
Groqgroqhttps://api.groq.comGROQ_API_KEY
DeepSeekdeepseekhttps://api.deepseek.comDEEPSEEK_API_KEY

Local Backend

Ollama (Default)

Ollama is the default backend for local model serving.

yaml
run:
  chat:
    backend: ollama
    model: llama3.2:1b
    prompt: "Hello, world!"

Configuration:

yaml
# Custom Ollama URL
run:
  chat:
    backend: ollama
    baseUrl: "http://custom-ollama:11434"
    model: llama3.2:1b
    prompt: "{{ get('q') }}"

Workflow-level Ollama configuration:

yaml
settings:
  agentSettings:
    models:
      - llama3.2:1b
      - nomic-embed-text
    ollamaUrl: "http://ollama:11434"
    installOllama: true  # Explicitly install Ollama in Docker image

Docker Build:

When building Docker images, Ollama is automatically installed if:

  • A Chat resource uses the ollama backend (or no backend specified)
  • Models are configured in agentSettings.models
  • installOllama: true is explicitly set

You can also disable Ollama installation by setting installOllama: false.

Cloud Backends

OpenAI

yaml
run:
  chat:
    backend: openai
    model: gpt-4o
    prompt: "{{ get('q') }}"

With explicit API key:

yaml
run:
  chat:
    backend: openai
    apiKey: "{{ get('OPENAI_API_KEY', 'env') }}"
    model: gpt-4o
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
gpt-4oLatest GPT-4 Omni
gpt-4o-miniSmaller, faster GPT-4
gpt-4-turboGPT-4 Turbo
gpt-3.5-turboFast, cost-effective

Anthropic (Claude)

yaml
run:
  chat:
    backend: anthropic
    model: claude-3-5-sonnet-20241022
    prompt: "{{ get('q') }}"
    contextLength: 4096

Available models:

ModelDescription
claude-3-5-sonnet-20241022Latest Claude 3.5 Sonnet
claude-3-opus-20240229Most capable
claude-3-sonnet-20240229Balanced
claude-3-haiku-20240307Fast, efficient

Google (Gemini)

yaml
run:
  chat:
    backend: google
    model: gemini-1.5-pro
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
gemini-1.5-proLatest Gemini Pro
gemini-1.5-flashFast inference
gemini-proStandard Gemini

Mistral

yaml
run:
  chat:
    backend: mistral
    model: mistral-large-latest
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
mistral-large-latestMost capable
mistral-medium-latestBalanced
mistral-small-latestFast, efficient
open-mistral-7bOpen-source 7B
open-mixtral-8x7bMoE model

Together AI

Access to many open-source models.

yaml
run:
  chat:
    backend: together
    model: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
    prompt: "{{ get('q') }}"

Popular models:

ModelDescription
meta-llama/Meta-Llama-3.1-70B-Instruct-TurboLlama 3.1 70B
meta-llama/Meta-Llama-3.1-8B-Instruct-TurboLlama 3.1 8B
mistralai/Mixtral-8x7B-Instruct-v0.1Mixtral 8x7B
Qwen/Qwen2-72B-InstructQwen2 72B

Groq

Ultra-fast inference with Groq hardware.

yaml
run:
  chat:
    backend: groq
    model: llama-3.1-70b-versatile
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
llama-3.1-70b-versatileLlama 3.1 70B
llama-3.1-8b-instantLlama 3.1 8B (fastest)
mixtral-8x7b-32768Mixtral with 32K context
gemma2-9b-itGoogle Gemma 2 9B

Perplexity

Search-augmented LLM responses.

yaml
run:
  chat:
    backend: perplexity
    model: llama-3.1-sonar-large-128k-online
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
llama-3.1-sonar-large-128k-onlineLarge with web search
llama-3.1-sonar-small-128k-onlineSmall with web search
llama-3.1-sonar-large-128k-chatLarge chat only

Cohere

yaml
run:
  chat:
    backend: cohere
    model: command-r-plus
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
command-r-plusMost capable
command-rFast and capable
commandStandard
command-lightFast, efficient

DeepSeek

yaml
run:
  chat:
    backend: deepseek
    model: deepseek-chat
    prompt: "{{ get('q') }}"

Available models:

ModelDescription
deepseek-chatGeneral chat
deepseek-coderCode generation

Backend Configuration

Common Options

All backends support these options:

yaml
run:
  chat:
    backend: openai          # Backend name
    baseUrl: "https://..."   # Custom base URL (optional)
    apiKey: "sk-..."         # API key (optional, falls back to env)
    model: gpt-4o            # Model name
    prompt: "{{ get('q') }}" # User prompt

    # Optional settings
    contextLength: 4096      # Max tokens
    jsonResponse: true       # Request JSON output
    jsonResponseKeys:        # Expected JSON keys
      - answer
      - confidence

Custom Base URL

Override the default API URL:

yaml
# Use Azure OpenAI
run:
  chat:
    backend: openai
    baseUrl: "https://my-resource.openai.azure.com/openai/deployments/my-deployment"
    apiKey: "{{ get('AZURE_OPENAI_KEY', 'env') }}"
    model: gpt-4o
    prompt: "{{ get('q') }}"

# Use OpenAI-compatible proxy
run:
  chat:
    backend: openai
    baseUrl: "https://my-proxy.example.com"
    model: gpt-4o
    prompt: "{{ get('q') }}"

API Key Configuration

API keys can be provided in multiple ways:

1. Environment variable (recommended):

bash
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

2. In resource configuration:

yaml
run:
  chat:
    backend: openai
    apiKey: "{{ get('OPENAI_API_KEY', 'env') }}"
    model: gpt-4o

3. From session/request:

yaml
run:
  chat:
    backend: openai
    apiKey: "{{ get('apiKey', 'session') }}"
    model: gpt-4o

Mixing Backends

Use different backends in the same workflow:

yaml
# resources/fast-summary.yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
  actionId: fastSummary
run:
  chat:
    backend: groq
    model: llama-3.1-8b-instant
    prompt: "Summarize: {{ get('q') }}"

---
# resources/deep-analysis.yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
  actionId: deepAnalysis
  requires:
    - fastSummary
run:
  chat:
    backend: anthropic
    model: claude-3-5-sonnet-20241022
    prompt: |
      Based on this summary: {{ get('fastSummary') }}
      Provide detailed analysis.

Feature Support

FeatureOllamaOpenAIAnthropicGoogleMistralGroq
JSON ResponseYesYesPartialYesYesYes
Tools/FunctionsYesYesNoYesYesYes
VisionYes*YesYesYesYesYes
StreamingNo**No**No**No**No**No**

*Requires vision-capable model (e.g., llama3.2-vision) **KDeps uses non-streaming for reliability

Troubleshooting

Connection Issues

Local backend not responding:

yaml
# Verify the backend is running
run:
  httpClient:
    url: "http://localhost:11434/api/tags"
    method: GET

API key errors:

yaml
# Debug: check if API key is set
run:
  expr:
    - set('hasKey', get('OPENAI_API_KEY', 'env') != nil)
    - set('keyLength', len(default(get('OPENAI_API_KEY', 'env'), '')))

Model Not Found

Ensure the model is available:

Ollama:

bash
ollama list  # See available models
ollama pull llama3.2:1b  # Download model

Cloud backends: Check the provider's documentation for current model names.

Rate Limiting

Handle rate limits with retry configuration:

yaml
run:
  chat:
    backend: openai
    model: gpt-4o
    prompt: "{{ get('q') }}"
    retry:
      maxAttempts: 3
      initialDelay: "1s"
      maxDelay: "30s"
      backoffMultiplier: 2

See Also

Released under the MIT License.