LLM Inference Service

Optional local AI inference service that powers Windshift's AI features. Runs a 1.2B parameter language model via llama.cpp - no GPU required, CPU-only inference with ~2 GB RAM.

AI Features

The LLM service enables the following features in Windshift:

Feature	Description
Plan My Day	Generates a prioritized daily schedule from your assigned items
Catch Me Up	Summarizes recent activity and changes on a work item
Find Similar	Detects duplicate or related items across your workspace
Decompose	Breaks a work item into smaller sub-tasks
Release Notes	Generates release notes from a milestone's completed items
Dependency Analysis	Identifies dependencies between work items across teams and sprints
AI Chat	Interactive assistant that can look up and update items, track time, and answer questions
Daily Briefing	Morning briefing with activity recap, today's focus, and upcoming deadlines

Docker Compose

Add the LLM service to your docker-compose.yml:

services:
  windshift:
    image: ghcr.io/windshiftapp/windshift:latest
    environment:
      - BASE_URL=https://windshift.example.com
      - SSO_SECRET=${SSO_SECRET}
      - LLM_ENDPOINT=http://llm:8081
    depends_on:
      llm:
        condition: service_healthy

  llm:
    image: ghcr.io/windshiftapp/llm:latest
    container_name: windshift-llm
    restart: unless-stopped
    environment:
      - LLM_PORT=8081
      - LLM_CTX_SIZE=4096
      - LLM_THREADS=${LLM_THREADS:-4}
      - LLM_PARALLEL=2
      - LLM_BATCH_SIZE=512
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8081/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 120s

Environment Variables

Variable	Default	Description
`LLM_PORT`	`8081`	Server port
`LLM_CTX_SIZE`	`4096`	Context window size
`LLM_THREADS`	`4`	CPU threads for inference
`LLM_PARALLEL`	`2`	Maximum concurrent requests
`LLM_BATCH_SIZE`	`512`	Batch size for prompt processing

Connecting to Windshift

Set LLM_ENDPOINT on your main Windshift service and add a depends_on with a healthcheck condition:

windshift:
  environment:
    - LLM_ENDPOINT=http://llm:8081
  depends_on:
    llm:
      condition: service_healthy

See Environment Variables for the full configuration reference.

Resource Requirements

RAM: ~2 GB (1.25 GB model + KV cache + processing overhead)
CPU: Any modern x86_64 or ARM64 processor. More threads = faster inference.
GPU: Not required. The service runs entirely on CPU.
Startup: ~120 seconds while the model loads into memory. The healthcheck start_period accounts for this.

Adjust LLM_THREADS based on your available CPU cores. The default of 4 threads works well for most deployments.

Bundled Model & Licensing

The default Docker image ships with LiquidAI's LFM2.5-1.2B-Instruct model, licensed under the LFM Open License v1.0.

Important: Organizations with $10M+ annual revenue cannot use this model commercially without a separate license from Liquid AI.

If this restriction applies to you, swap the model - the llama.cpp server supports any GGUF-format model. Mount a different .gguf file into the container and adjust LLM_CTX_SIZE to match:

llm:
  image: ghcr.io/windshiftapp/llm:latest
  environment:
    - LLM_CTX_SIZE=8192
  volumes:
    - ./my-model.gguf:/models/model.gguf

External LLM Providers

Instead of running the local inference service, you can configure external LLM providers (OpenAI, Anthropic) through the admin UI. This is useful if you prefer cloud-hosted models or need more capable models for your workload.

Configure providers via the LLM_PROVIDERS_FILE environment variable or --llm-providers CLI flag. See Configuration Options for details.

Customizing AI Prompts

Windshift's AI features use embedded system prompts that work out of the box. You can override any prompt by placing a .txt file in the prompts directory. Only the prompts you provide are replaced; the rest keep their defaults.

Prompt Files

Filename	Feature
`plan_my_day.txt`	Plan My Day
`catch_me_up.txt`	Catch Me Up
`find_similar.txt`	Find Similar
`decompose.txt`	Decompose
`release_notes.txt`	Release Notes
`dependency_analysis.txt`	Dependency Analysis
`ai_chat.txt`	AI Chat
`daily_briefing.txt`	Daily Briefing

Docker Setup

The Docker image sets AI_PROMPTS_DIR=/data/prompts by default. Mount a volume to that path with your override files:

services:
  windshift:
    image: ghcr.io/windshiftapp/windshift:latest
    volumes:
      - ./my-prompts:/data/prompts:ro
    environment:
      - LLM_ENDPOINT=http://llm:8081

Only the files you place in the directory are overridden. For example, to customize just the "Plan My Day" prompt, create a single plan_my_day.txt file. All other prompts use the built-in defaults.

Outside Docker, pass the directory path with --ai-prompts-dir or AI_PROMPTS_DIR. See Configuration Options for details.

Format Placeholders in `ai_chat.txt`

The ai_chat.txt prompt contains four format placeholders that are filled at runtime:

Today's date is %s.

The current user is %s (user ID: %d). When the user asks about
"my items", "assigned to me", or similar, use the list_items
tool with assignee_id=%d.

When customizing this file, you must preserve these %s and %d verbs in the correct order. Removing or reordering them will cause incorrect output.

Testing Prompts

Append ?preview=true to any AI feature endpoint to see the assembled prompts without calling the LLM:

curl http://localhost:8080/ai/plan-my-day?preview=true

The response includes system_prompt and prompt fields showing exactly what would be sent to the model. Use this to verify your override loaded correctly.

Checking Status

Verify the AI service is available:

curl http://localhost:8081/health

From within Windshift, the GET /ai/status endpoint reports whether AI features are active and which provider is in use.