Skip to content

Multi-Modal LLM Models

Kdeps enables seamless interaction with multi-modal LLM models such as llava and llama3.2-vision. For a comprehensive list of supported multi-modal LLM models, refer to the Ollama Vision models page.

Adding an LLM Model

Before using an LLM model, you need to include it in the workflow.pkl file. For example:

apl
models {
    "llama3.3"
    "llama3.2-vision"
}

Interacting with an LLM Model

To interact with a model, you'll need to provide a prompt and a file. Below is an example of an LLM Resource that utilizes the llama3.2-vision model with a file uploaded via the API:

apl
actionID = "llamaVision"
chat {
  model = "llama3.2-vision"
  prompt = "Describe this image"
  JSONResponse = true
  JSONResponseKeys = {
    "description_text"
    "style_text"
    "category_text"
  }
  files {
    "@(request.files()[0])" // Uses the first uploaded file
  }
}

Using Processed Image Data

Once the image is processed, you can leverage its output in your resources. For example:

apl
local jsonPath = "@(llm.file("llamaVision"))"
local JSONData = "@(read?("\(jsonPath)")?.text)"

local imageDescription = "@(document.JSONParser(JSONData)?.description_text)"
local imageStyle = "@(document.JSONParser(JSONData)?.style_text)"
local imageCategory = "@(document.JSONParser(JSONData)?.category_text)"

This example demonstrates how to extract key details like description, style, and category from the processed image data for further use in your workflow.