Skip to content

Embedding Resource

The embedding executor is a native capability compiled into the kdeps binary. It provides a SQLite-backed keyword store for indexing, searching, upserting, and deleting text documents. Use it as the storage layer for RAG pipelines that run fully on-prem.

Configuration

yaml
run:
  embedding:
    operation: "index"                    # required: index | search | upsert | delete
    text: "document content here"         # required for index/search/upsert (optional for delete-all)
    collection: "default"                 # optional namespace (default: "default")
    dbPath: "/data/kdeps-store.db"       # optional path (default: "kdeps-embedding.db")
    limit: 10                             # optional max search results (default: 10)
FieldTypeRequiredDefaultDescription
operationstringyesindex, search, upsert, or delete
textstringyes*Text to index, query, or delete. *Optional for delete (omit to delete entire collection)
collectionstringno"default"Namespace for documents
dbPathstringno"kdeps-embedding.db"SQLite database file path
limitintegerno10Max results for search

Operations

OperationDescription
indexInsert text (ignored if duplicate)
upsertInsert or replace text
searchCase-insensitive keyword search via LIKE
deleteDelete by text, or whole collection if text is empty

Output

KeyTypeDescription
operationstringThe operation that was performed
collectionstringThe collection used
successbooltrue on success
results[]stringMatching texts (search only)
countintegerNumber of results (search only)
affectedintegerRows deleted (delete only)
jsonstringFull result as JSON string

Examples

Build a RAG pipeline

yaml
# Step 1: Scrape content
metadata:
  actionId: fetch
run:
  scraper:
    url: "{{ get('url') }}"

# Step 2: Index it
metadata:
  actionId: storeDoc
  requires: [fetch]
run:
  embedding:
    operation: "index"
    text: "{{ output('fetch').content }}"
    collection: "knowledge"
    dbPath: "/data/store.db"

# Step 3: Search on user query
metadata:
  actionId: findDocs
run:
  embedding:
    operation: "search"
    text: "{{ get('query') }}"
    collection: "knowledge"
    dbPath: "/data/store.db"
    limit: 5

# Step 4: Answer with context
metadata:
  actionId: answer
  requires: [findDocs]
run:
  chat:
    model: llama3.2:1b
    prompt: |
      Context: {{ output('findDocs').results }}
      Question: {{ get('query') }}
  apiResponse:
    response: "{{ output('answer') }}"

Collections

Use collection to namespace documents — useful for multi-tenant or multi-topic stores:

yaml
# Index into separate collections
embedding:
  operation: "index"
  text: "..."
  collection: "contracts"

# Search only within one collection
embedding:
  operation: "search"
  text: "termination clause"
  collection: "contracts"

Note: This uses keyword (LIKE) matching, not vector similarity. For OpenAI vector embeddings, install the component:

bash
kdeps registry install embedding

Next Steps

Released under the Apache 2.0 License.