Vision Models

This tutorial demonstrates how to use vision-capable LLMs in KDeps v2 to analyze images, extract information, and perform multimodal tasks.

Prerequisites

KDeps installed (see Installation)
Ollama installed and running
A vision model pulled: ollama pull moondream:1.8b or ollama pull llava:7b

Overview

Vision models can process images along with text prompts. KDeps supports:

Image file uploads via multipart form-data
Local image files
Multiple images in a single request
Structured JSON responses

Step 1: Install a Vision Model

Install a vision-capable model in Ollama:

bash

# Lightweight and fast
ollama pull moondream:1.8b

# More capable
ollama pull llava:7b

# Best quality (slower)
ollama pull llava:13b

Step 2: Create the Workflow

Create workflow.yaml:

yaml

apiVersion: kdeps.io/v1
kind: Workflow

metadata:
  name: vision
  description: Vision model example
  version: "1.0.0"
  targetActionId: visionResponse

settings:
  apiServerMode: true
  apiServer:
    hostIp: "127.0.0.1"
    portNum: 3000
    routes:
      - path: /api/v1/vision
        methods: [POST]
    cors:
      enableCors: true
      allowOrigins:
        - http://localhost:8080

  agentSettings:
    timezone: Etc/UTC
    pythonVersion: "3.12"
    models:
      - moondream:1.8b
      - llava:7b

Step 3: Create the Vision LLM Resource

Create resources/vision-llm.yaml:

yaml

apiVersion: kdeps.io/v1
kind: Resource

metadata:
  actionId: visionLLM
  name: Vision LLM

run:
  chat:
    model: moondream:1.8b
    role: user
    prompt: "{{ get('q', 'param') }}"
    files:
      # Get uploaded file path
      - "{{ get('file', 'filepath') }}"
    jsonResponse: true
    jsonResponseKeys:
      - description
      - objects
      - scene

Key Points:

files field accepts an array of file paths
Use get('file', 'filepath') for uploaded files
jsonResponse: true ensures structured output

Step 4: Create the Response Resource

Create resources/vision-response.yaml:

yaml

apiVersion: kdeps.io/v1
kind: Resource

metadata:
  actionId: visionResponse
  name: Vision Response
  requires:
    - visionLLM

run:
  apiResponse:
    success: true
    response:
      query: get('q', 'param')
      analysis: get('visionLLM')
      file_info:
        filename: get('file', 'filename')
        filetype: get('file', 'filetype')

Step 5: Test with Image Upload

Upload an image and query it:

bash

curl -X POST 'http://localhost:3000/api/v1/vision?q=What%20is%20in%20this%20image?' \
  -F "file=@image.jpg"

Expected response:

json

{
  "success": true,
  "data": {
    "query": "What is in this image?",
    "analysis": {
      "description": "A red panda sitting on a tree branch...",
      "objects": ["panda", "branch", "tree"],
      "scene": "forest"
    },
    "file_info": {
      "filename": "image.jpg",
      "filetype": "image/jpeg"
    }
  }
}

Image Sources

Uploaded Files

From multipart form-data uploads:

yaml

files:
  - "{{ get('file', 'filepath') }}"

Local Files

From the filesystem:

yaml

files:
  - "./images/photo.jpg"
  - "{{ get('image_path') }}"

Multiple Images

Process multiple images:

yaml

files:
  - "{{ get('file1', 'filepath') }}"
  - "{{ get('file2', 'filepath') }}"

Or using file array:

yaml

files:
  - "{{ get('file[]', 'filepath', 0) }}"
  - "{{ get('file[]', 'filepath', 1) }}"

Supported Models

moondream:1.8b

Best for: Fast, lightweight queries
Use cases: Simple descriptions, object detection
Speed: Very fast
Quality: Good for basic tasks

yaml

model: moondream:1.8b

llava:7b

Best for: Balanced performance
Use cases: Detailed descriptions, scene analysis
Speed: Moderate
Quality: High quality

yaml

model: llava:7b

llava:13b

Best for: Best quality
Use cases: Complex analysis, detailed descriptions
Speed: Slower
Quality: Highest quality

yaml

model: llava:13b

Use Cases

Image Description

yaml

run:
  chat:
    model: moondream:1.8b
    prompt: "Describe this image in detail"
    files:
      - "{{ get('file', 'filepath') }}"

Object Detection

yaml

run:
  chat:
    model: llava:7b
    prompt: "List all objects in this image"
    jsonResponse: true
    jsonResponseKeys:
      - objects
      - count
    files:
      - "{{ get('file', 'filepath') }}"

Scene Analysis

yaml

run:
  chat:
    model: llava:7b
    prompt: "Analyze the scene: location, time of day, weather, mood"
    jsonResponse: true
    jsonResponseKeys:
      - location
      - time_of_day
      - weather
      - mood
    files:
      - "{{ get('file', 'filepath') }}"

Image Comparison

yaml

run:
  chat:
    model: llava:13b
    prompt: "Compare these two images and describe the differences"
    files:
      - "{{ get('file1', 'filepath') }}"
      - "{{ get('file2', 'filepath') }}"
    jsonResponse: true
    jsonResponseKeys:
      - differences
      - similarities

OCR Alternative

yaml

run:
  chat:
    model: llava:7b
    prompt: "Extract all text from this image"
    jsonResponse: true
    jsonResponseKeys:
      - text
      - confidence
    files:
      - "{{ get('file', 'filepath') }}"

Advanced Configuration

With System Prompt

yaml

run:
  chat:
    model: llava:7b
    scenario:
      - role: system
        prompt: "You are an expert image analyst. Provide detailed, accurate descriptions."
      - role: user
        prompt: "{{ get('q') }}"
    files:
      - "{{ get('file', 'filepath') }}"

With Tools

Combine vision with function calling:

yaml

run:
  chat:
    model: llava:7b
    prompt: "{{ get('q') }}"
    files:
      - "{{ get('file', 'filepath') }}"
    tools:
      - name: save_analysis
        description: Save the image analysis
        parameters:
          description:
            type: string
            description: The image description

Image Formats

Supported image formats:

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)

Performance Tips

Choose the Right Model: Use moondream:1.8b for speed, llava:13b for quality
Image Size: Smaller images process faster
Batch Processing: Process multiple images in one request
Caching: Cache results for repeated queries

Error Handling

Handle errors gracefully:

yaml

apiVersion: kdeps.io/v1
kind: Resource

metadata:
  actionId: visionLLM
  name: Vision LLM

run:
  validations:
    - info('filecount') > 0
    - get('file', 'filetype') in ['image/jpeg', 'image/png', 'image/webp']
  chat:
    model: moondream:1.8b
    prompt: "{{ get('q') }}"
    files:
      - "{{ get('file', 'filepath') }}"
  onError:
    apiResponse:
      success: false
      response:
        error: "Failed to process image"

Complete Example

yaml

apiVersion: kdeps.io/v1
kind: Workflow

metadata:
  name: vision-demo
  version: "1.0.0"
  targetActionId: visionResponse

settings:
  apiServerMode: true
  apiServer:
    hostIp: "127.0.0.1"
    portNum: 3000
    routes:
      - path: /api/v1/vision
        methods: [POST]

  agentSettings:
    models:
      - moondream:1.8b

---
# resources/vision-llm.yaml
apiVersion: kdeps.io/v1
kind: Resource

metadata:
  actionId: visionLLM
  name: Vision LLM

run:
  chat:
    model: moondream:1.8b
    prompt: "{{ get('q', 'param') }}"
    files:
      - "{{ get('file', 'filepath') }}"
    jsonResponse: true
    jsonResponseKeys:
      - description
      - objects

---
# resources/vision-response.yaml
apiVersion: kdeps.io/v1
kind: Resource

metadata:
  actionId: visionResponse
  name: Vision Response
  requires:
    - visionLLM

run:
  apiResponse:
    success: true
    response:
      query: get('q', 'param')
      analysis: get('visionLLM')

Testing

Single Image

bash

curl -X POST 'http://localhost:3000/api/v1/vision?q=Describe%20this%20image' \
  -F "file=@photo.jpg"

Multiple Images

bash

curl -X POST 'http://localhost:3000/api/v1/vision?q=Compare%20these%20images' \
  -F "file[]=@image1.jpg" \
  -F "file[]=@image2.jpg"

Troubleshooting

Model Not Found

Ensure the model is pulled: ollama pull moondream:1.8b
Check model name matches exactly
Verify Ollama is running

Image Not Processed

Check file format is supported (JPEG, PNG, WebP)
Verify file path is correct
Ensure file was uploaded successfully

Slow Processing

Use smaller images
Try a faster model (moondream:1.8b)
Check system resources

Next Steps

File Uploads: Learn about file upload handling
Tools: Combine vision with function calling
Batch Processing: Process multiple images with items iteration
LLM Configuration: See LLM resource for advanced options

LLM Resource - Complete LLM configuration reference
File Upload - Handling file uploads
Unified API - Accessing file data
Tools - Function calling with vision

Vision Models ​

Prerequisites ​

Overview ​

Step 1: Install a Vision Model ​

Step 2: Create the Workflow ​

Step 3: Create the Vision LLM Resource ​

Step 4: Create the Response Resource ​

Step 5: Test with Image Upload ​

Image Sources ​

Uploaded Files ​

Local Files ​

Multiple Images ​

Supported Models ​

moondream:1.8b ​

llava:7b ​

llava:13b ​

Use Cases ​

Image Description ​

Object Detection ​

Scene Analysis ​

Image Comparison ​

OCR Alternative ​

Advanced Configuration ​

With System Prompt ​

With Tools ​

Image Formats ​

Performance Tips ​

Error Handling ​

Complete Example ​

Testing ​

Single Image ​

Multiple Images ​

Troubleshooting ​

Model Not Found ​

Image Not Processed ​

Slow Processing ​

Next Steps ​

Related Documentation ​

Vision Models

Prerequisites

Overview

Step 1: Install a Vision Model

Step 2: Create the Workflow

Step 3: Create the Vision LLM Resource

Step 4: Create the Response Resource

Step 5: Test with Image Upload

Image Sources

Uploaded Files

Local Files

Multiple Images

Supported Models

moondream:1.8b

llava:7b

llava:13b

Use Cases

Image Description

Object Detection

Scene Analysis

Image Comparison

OCR Alternative

Advanced Configuration

With System Prompt

With Tools

Image Formats

Performance Tips

Error Handling

Complete Example

Testing

Single Image

Multiple Images

Troubleshooting

Model Not Found

Image Not Processed

Slow Processing

Next Steps

Related Documentation