# LLM Inference Service Optional local AI inference service that powers Windshift's AI features. Runs a 1.2B parameter language model via [llama.cpp](https://github.com/ggerganov/llama.cpp) - no GPU required, CPU-only inference with ~2 GB RAM. ## AI Features The LLM service enables the following features in Windshift: | Feature | Description | |---------|-------------| | Plan My Day | Generates a prioritized daily schedule from your assigned items | | Catch Me Up | Summarizes recent activity and changes on a work item | | Find Similar | Detects duplicate or related items across your workspace | | Decompose | Breaks a work item into smaller sub-tasks | | Release Notes | Generates release notes from a milestone's completed items | ## Docker Compose Add the LLM service to your `docker-compose.yml`: ```yaml services: windshift: image: ghcr.io/windshiftapp/windshift:latest environment: - BASE_URL=https://windshift.example.com - SSO_SECRET=${SSO_SECRET} - LLM_ENDPOINT=http://llm:8081 depends_on: llm: condition: service_healthy llm: image: ghcr.io/windshiftapp/llm:latest container_name: windshift-llm restart: unless-stopped environment: - LLM_PORT=8081 - LLM_CTX_SIZE=4096 - LLM_THREADS=${LLM_THREADS:-4} - LLM_PARALLEL=2 - LLM_BATCH_SIZE=512 healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8081/health"] interval: 30s timeout: 10s retries: 3 start_period: 120s ``` ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `LLM_PORT` | `8081` | Server port | | `LLM_CTX_SIZE` | `4096` | Context window size | | `LLM_THREADS` | `4` | CPU threads for inference | | `LLM_PARALLEL` | `2` | Maximum concurrent requests | | `LLM_BATCH_SIZE` | `512` | Batch size for prompt processing | ## Connecting to Windshift Set `LLM_ENDPOINT` on your main Windshift service and add a `depends_on` with a healthcheck condition: ```yaml windshift: environment: - LLM_ENDPOINT=http://llm:8081 depends_on: llm: condition: service_healthy ``` See [Environment Variables](/docs/03-configuration/03-environment-variables) for the full configuration reference. ## Resource Requirements - **RAM:** ~2 GB (1.25 GB model + KV cache + processing overhead) - **CPU:** Any modern x86_64 or ARM64 processor. More threads = faster inference. - **GPU:** Not required. The service runs entirely on CPU. - **Startup:** ~120 seconds while the model loads into memory. The healthcheck `start_period` accounts for this. Adjust `LLM_THREADS` based on your available CPU cores. The default of 4 threads works well for most deployments. ## Bundled Model & Licensing The default Docker image ships with **LiquidAI's LFM2.5-1.2B-Instruct** model, licensed under the [LFM Open License v1.0](https://www.liquid.ai/liquid-foundation-models-license). > **Important:** Organizations with **$10M+ annual revenue** cannot use this model commercially without a separate license from Liquid AI. If this restriction applies to you, swap the model - the llama.cpp server supports any GGUF-format model. Mount a different `.gguf` file into the container and adjust `LLM_CTX_SIZE` to match: ```yaml llm: image: ghcr.io/windshiftapp/llm:latest environment: - LLM_CTX_SIZE=8192 volumes: - ./my-model.gguf:/models/model.gguf ``` ## External LLM Providers Instead of running the local inference service, you can configure external LLM providers (OpenAI, Anthropic) through the admin UI. This is useful if you prefer cloud-hosted models or need more capable models for your workload. Configure providers via the `LLM_PROVIDERS_FILE` environment variable or `--llm-providers` CLI flag. See [Configuration Options](/docs/03-configuration/01-options) for details. ## Checking Status Verify the AI service is available: ```bash curl http://localhost:8081/health ``` From within Windshift, the `GET /ai/status` endpoint reports whether AI features are active and which provider is in use.