Skip to content

LLM Resource

The LLM (chat) resource enables interaction with language models for text generation, question answering, and AI-powered tasks.

Basic Usage

yaml
apiVersion: kdeps.io/v1
kind: Resource

metadata:
  actionId: llmResource
  name: LLM Chat

run:
  chat:
    model: llama3.2:1b
    prompt: "{{ get('q') }}"
    timeoutDuration: 60s

Configuration Options

Complete Reference

yaml
run:
  chat:
    # Model Configuration
    model: llama3.2:1b              # Required: Model name
    backend: ollama                  # Backend: ollama, openai, anthropic, etc.
    baseUrl: http://localhost:11434  # Custom backend URL
    apiKey: "sk-..."                 # API key (or use env var)
    contextLength: 8192              # Context window size

    # Prompt Configuration
    role: user                       # Role: user, assistant, system
    prompt: "{{ get('q') }}"        # The prompt to send

    # Advanced Generation Parameters
    temperature: 0.7                 # 0.0 to 2.0 (default varies by backend)
    maxTokens: 1000                  # Max tokens to generate
    topP: 0.9                        # Nucleus sampling (0.0 to 1.0)
    frequencyPenalty: 0.0            # -2.0 to 2.0
    presencePenalty: 0.0             # -2.0 to 2.0

    # Conversation Context
    scenario:
      - role: system
        prompt: You are a helpful assistant.
      - role: assistant
        prompt: I'm ready to help!

    # Tools (Function Calling)
    tools:
      - name: calculate
        description: Perform math
        script: calcResource
        parameters:
          expression:
            type: string
            required: true

    # File Attachments (Vision)
    files:
      - "{{ get('file', 'filepath') }}"

    # Response Formatting
    jsonResponse: true
    jsonResponseKeys:
      - answer
      - confidence

    # Timeout
    timeoutDuration: 60s

Backends

KDeps supports multiple LLM backends:

Local Backend

BackendDefault URLDescription
ollamalocalhost:11434Ollama (default)

Cloud Backends

BackendEnvironment VariableDescription
openaiOPENAI_API_KEYOpenAI GPT models
anthropicANTHROPIC_API_KEYClaude models
googleGOOGLE_API_KEYGemini models
mistralMISTRAL_API_KEYMistral AI
togetherTOGETHER_API_KEYTogether AI
groqGROQ_API_KEYGroq (fast inference)
perplexityPERPLEXITY_API_KEYPerplexity AI
cohereCOHERE_API_KEYCohere
deepseekDEEPSEEK_API_KEYDeepSeek

Backend Examples

Ollama (Default)

yaml
chat:
  model: llama3.2:1b
  backend: ollama  # Optional, this is default
  prompt: "{{ get('q') }}"

OpenAI

yaml
chat:
  model: gpt-4
  backend: openai
  apiKey: "{{ get('OPENAI_API_KEY', 'env') }}"
  prompt: "{{ get('q') }}"

Anthropic (Claude)

yaml
chat:
  model: claude-3-opus-20240229
  backend: anthropic
  prompt: "{{ get('q') }}"

Advanced Parameters

Fine-tune the model's output generation:

  • temperature: Controls randomness. Higher values (e.g., 0.8) make output more random, lower values (e.g., 0.2) make it more focused and deterministic.
  • maxTokens: The maximum number of tokens to generate in the completion.
  • topP: An alternative to sampling with temperature, called nucleus sampling. The model considers the results of the tokens with top_p probability mass.
  • frequencyPenalty: Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  • presencePenalty: Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
yaml
chat:
  model: llama3.2:1b
  prompt: "Write a creative story"
  temperature: 0.9
  presencePenalty: 0.6
  maxTokens: 500

Context Length

Control the context window size:

yaml
chat:
  model: llama3.2:1b
  contextLength: 8192  # Options: 4096, 8192, 16384, 32768, 65536, 131072, 262144

Scenario (Conversation History)

Build multi-turn conversations:

yaml
chat:
  model: llama3.2:1b
  prompt: "{{ get('q') }}"
  scenario:
    - role: system
      prompt: |
        You are an expert software developer.
        Always provide code examples.
        Be concise and practical.

    - role: user
      prompt: What is a REST API?

    - role: assistant
      prompt: |
        A REST API is an architectural style for web services.
        It uses HTTP methods (GET, POST, PUT, DELETE) to perform operations.

JSON Response

Get structured JSON output:

yaml
chat:
  model: llama3.2:1b
  prompt: "Analyze: {{ get('q') }}"
  jsonResponse: true
  jsonResponseKeys:
    - summary
    - sentiment
    - keywords
    - confidence

Output:

json
{
  "summary": "...",
  "sentiment": "positive",
  "keywords": ["ai", "machine learning"],
  "confidence": 0.95
}

Vision (File Attachments)

Process images with vision-capable models:

yaml
chat:
  model: llama3.2-vision
  prompt: "Describe this image"
  files:
    - "{{ get('file', 'filepath') }}"  # From upload
    - "./images/example.jpg"            # From filesystem

Tools (Function Calling)

Enable LLMs to call other resources:

yaml
# Main LLM resource
metadata:
  actionId: llmWithTools

run:
  chat:
    model: llama3.2:1b
    prompt: "{{ get('q') }}"
    tools:
      - name: calculate
        description: Perform mathematical calculations
        script: calcTool  # References another resource
        parameters:
          expression:
            type: string
            description: Math expression (e.g., "2 + 2")
            required: true

      - name: search_db
        description: Search the database
        script: dbSearchTool
        parameters:
          query:
            type: string
            description: Search query
            required: true
          limit:
            type: integer
            description: Max results
            required: false

The LLM automatically decides when to call tools based on the prompt.

Examples

Simple Q&A

yaml
run:
  chat:
    model: llama3.2:1b
    prompt: "{{ get('q') }}"
    scenario:
      - role: system
        prompt: Answer questions concisely.
    jsonResponse: true
    jsonResponseKeys:
      - answer
    timeoutDuration: 30s

Code Generation

yaml
run:
  chat:
    model: codellama
    prompt: "Write a Python function that {{ get('task') }}"
    scenario:
      - role: system
        prompt: |
          You are an expert Python developer.
          Write clean, documented code.
          Include type hints.
    jsonResponse: true
    jsonResponseKeys:
      - code
      - explanation
    timeoutDuration: 60s

Multi-Model Workflow

yaml
# Fast model for classification
metadata:
  actionId: classifier

run:
  chat:
    model: llama3.2:1b
    prompt: "Classify this query: {{ get('q') }}"
    jsonResponse: true
    jsonResponseKeys:
      - category
      - confidence

---
# Powerful model for complex queries
metadata:
  actionId: detailedResponse
  requires: [classifier]

run:
  skipCondition:
    - get('classifier').confidence < 0.8

  chat:
    model: llama3.2
    prompt: |
      Category: {{ get('classifier').category }}
      Query: {{ get('q') }}
      Provide a detailed response.
    timeoutDuration: 120s

Accessing Output

yaml
# In another resource
metadata:
  requires: [llmResource]

run:
  apiResponse:
    response:
      # Full response
      llm_output: get('llmResource')

      # Specific field (if JSON response)
      answer: get('llmResource').answer

Released under the MIT License.