Vision Models
This tutorial demonstrates how to use vision-capable LLMs in KDeps v2 to analyze images, extract information, and perform multimodal tasks.
Prerequisites
- KDeps installed (see Installation)
- Ollama installed and running
- A vision model pulled:
ollama pull moondream:1.8borollama pull llava:7b
Overview
Vision models can process images along with text prompts. KDeps supports:
- Image file uploads via multipart form-data
- Local image files
- Multiple images in a single request
- Structured JSON responses
Step 1: Install a Vision Model
Install a vision-capable model in Ollama:
bash
# Lightweight and fast
ollama pull moondream:1.8b
# More capable
ollama pull llava:7b
# Best quality (slower)
ollama pull llava:13bStep 2: Create the Workflow
Create workflow.yaml:
yaml
apiVersion: kdeps.io/v1
kind: Workflow
metadata:
name: vision
description: Vision model example
version: "1.0.0"
targetActionId: visionResponse
settings:
apiServerMode: true
apiServer:
hostIp: "127.0.0.1"
portNum: 3000
routes:
- path: /api/v1/vision
methods: [POST]
cors:
enableCors: true
allowOrigins:
- http://localhost:8080
agentSettings:
timezone: Etc/UTC
pythonVersion: "3.12"
models:
- moondream:1.8b
- llava:7bStep 3: Create the Vision LLM Resource
Create resources/vision-llm.yaml:
yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: visionLLM
name: Vision LLM
run:
chat:
model: moondream:1.8b
role: user
prompt: "{{ get('q', 'param') }}"
files:
# Get uploaded file path
- "{{ get('file', 'filepath') }}"
jsonResponse: true
jsonResponseKeys:
- description
- objects
- sceneKey Points:
filesfield accepts an array of file paths- Use
get('file', 'filepath')for uploaded files jsonResponse: trueensures structured output
Step 4: Create the Response Resource
Create resources/vision-response.yaml:
yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: visionResponse
name: Vision Response
requires:
- visionLLM
run:
apiResponse:
success: true
response:
query: get('q', 'param')
analysis: get('visionLLM')
file_info:
filename: get('file', 'filename')
filetype: get('file', 'filetype')Step 5: Test with Image Upload
Upload an image and query it:
bash
curl -X POST 'http://localhost:3000/api/v1/vision?q=What%20is%20in%20this%20image?' \
-F "file=@image.jpg"Expected response:
json
{
"success": true,
"data": {
"query": "What is in this image?",
"analysis": {
"description": "A red panda sitting on a tree branch...",
"objects": ["panda", "branch", "tree"],
"scene": "forest"
},
"file_info": {
"filename": "image.jpg",
"filetype": "image/jpeg"
}
}
}Image Sources
Uploaded Files
From multipart form-data uploads:
yaml
files:
- "{{ get('file', 'filepath') }}"Local Files
From the filesystem:
yaml
files:
- "./images/photo.jpg"
- "{{ get('image_path') }}"Multiple Images
Process multiple images:
yaml
files:
- "{{ get('file1', 'filepath') }}"
- "{{ get('file2', 'filepath') }}"Or using file array:
yaml
files:
- "{{ get('file[]', 'filepath', 0) }}"
- "{{ get('file[]', 'filepath', 1) }}"Supported Models
moondream:1.8b
- Best for: Fast, lightweight queries
- Use cases: Simple descriptions, object detection
- Speed: Very fast
- Quality: Good for basic tasks
yaml
model: moondream:1.8bllava:7b
- Best for: Balanced performance
- Use cases: Detailed descriptions, scene analysis
- Speed: Moderate
- Quality: High quality
yaml
model: llava:7bllava:13b
- Best for: Best quality
- Use cases: Complex analysis, detailed descriptions
- Speed: Slower
- Quality: Highest quality
yaml
model: llava:13bUse Cases
Image Description
yaml
run:
chat:
model: moondream:1.8b
prompt: "Describe this image in detail"
files:
- "{{ get('file', 'filepath') }}"Object Detection
yaml
run:
chat:
model: llava:7b
prompt: "List all objects in this image"
jsonResponse: true
jsonResponseKeys:
- objects
- count
files:
- "{{ get('file', 'filepath') }}"Scene Analysis
yaml
run:
chat:
model: llava:7b
prompt: "Analyze the scene: location, time of day, weather, mood"
jsonResponse: true
jsonResponseKeys:
- location
- time_of_day
- weather
- mood
files:
- "{{ get('file', 'filepath') }}"Image Comparison
yaml
run:
chat:
model: llava:13b
prompt: "Compare these two images and describe the differences"
files:
- "{{ get('file1', 'filepath') }}"
- "{{ get('file2', 'filepath') }}"
jsonResponse: true
jsonResponseKeys:
- differences
- similaritiesOCR Alternative
yaml
run:
chat:
model: llava:7b
prompt: "Extract all text from this image"
jsonResponse: true
jsonResponseKeys:
- text
- confidence
files:
- "{{ get('file', 'filepath') }}"Advanced Configuration
With System Prompt
yaml
run:
chat:
model: llava:7b
scenario:
- role: system
prompt: "You are an expert image analyst. Provide detailed, accurate descriptions."
- role: user
prompt: "{{ get('q') }}"
files:
- "{{ get('file', 'filepath') }}"With Tools
Combine vision with function calling:
yaml
run:
chat:
model: llava:7b
prompt: "{{ get('q') }}"
files:
- "{{ get('file', 'filepath') }}"
tools:
- name: save_analysis
description: Save the image analysis
parameters:
description:
type: string
description: The image descriptionImage Formats
Supported image formats:
- JPEG (
.jpg,.jpeg) - PNG (
.png) - WebP (
.webp)
Performance Tips
- Choose the Right Model: Use
moondream:1.8bfor speed,llava:13bfor quality - Image Size: Smaller images process faster
- Batch Processing: Process multiple images in one request
- Caching: Cache results for repeated queries
Error Handling
Handle errors gracefully:
yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: visionLLM
name: Vision LLM
run:
validations:
- info('filecount') > 0
- get('file', 'filetype') in ['image/jpeg', 'image/png', 'image/webp']
chat:
model: moondream:1.8b
prompt: "{{ get('q') }}"
files:
- "{{ get('file', 'filepath') }}"
onError:
apiResponse:
success: false
response:
error: "Failed to process image"Complete Example
yaml
apiVersion: kdeps.io/v1
kind: Workflow
metadata:
name: vision-demo
version: "1.0.0"
targetActionId: visionResponse
settings:
apiServerMode: true
apiServer:
hostIp: "127.0.0.1"
portNum: 3000
routes:
- path: /api/v1/vision
methods: [POST]
agentSettings:
models:
- moondream:1.8b
---
# resources/vision-llm.yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: visionLLM
name: Vision LLM
run:
chat:
model: moondream:1.8b
prompt: "{{ get('q', 'param') }}"
files:
- "{{ get('file', 'filepath') }}"
jsonResponse: true
jsonResponseKeys:
- description
- objects
---
# resources/vision-response.yaml
apiVersion: kdeps.io/v1
kind: Resource
metadata:
actionId: visionResponse
name: Vision Response
requires:
- visionLLM
run:
apiResponse:
success: true
response:
query: get('q', 'param')
analysis: get('visionLLM')Testing
Single Image
bash
curl -X POST 'http://localhost:3000/api/v1/vision?q=Describe%20this%20image' \
-F "file=@photo.jpg"Multiple Images
bash
curl -X POST 'http://localhost:3000/api/v1/vision?q=Compare%20these%20images' \
-F "file[]=@image1.jpg" \
-F "file[]=@image2.jpg"Troubleshooting
Model Not Found
- Ensure the model is pulled:
ollama pull moondream:1.8b - Check model name matches exactly
- Verify Ollama is running
Image Not Processed
- Check file format is supported (JPEG, PNG, WebP)
- Verify file path is correct
- Ensure file was uploaded successfully
Slow Processing
- Use smaller images
- Try a faster model (
moondream:1.8b) - Check system resources
Next Steps
- File Uploads: Learn about file upload handling
- Tools: Combine vision with function calling
- Batch Processing: Process multiple images with items iteration
- LLM Configuration: See LLM resource for advanced options
Related Documentation
- LLM Resource - Complete LLM configuration reference
- File Upload - Handling file uploads
- Unified API - Accessing file data
- Tools - Function calling with vision