# LLM Inference Service

Optional local AI inference service that powers Windshift's AI features. Runs a 1.2B parameter language model via [llama.cpp](https://github.com/ggerganov/llama.cpp) - no GPU required, CPU-only inference with ~2 GB RAM.

## AI Features

The LLM service enables the following features in Windshift:

| Feature | Description |
|---------|-------------|
| Plan My Day | Generates a prioritized daily schedule from your assigned items |
| Catch Me Up | Summarizes recent activity and changes on a work item |
| Find Similar | Detects duplicate or related items across your workspace |
| Decompose | Breaks a work item into smaller sub-tasks |
| Release Notes | Generates release notes from a milestone's completed items |

## Docker Compose

Add the LLM service to your `docker-compose.yml`:

```yaml
services:
  windshift:
    image: ghcr.io/windshiftapp/windshift:latest
    environment:
      - BASE_URL=https://windshift.example.com
      - SSO_SECRET=${SSO_SECRET}
      - LLM_ENDPOINT=http://llm:8081
    depends_on:
      llm:
        condition: service_healthy

  llm:
    image: ghcr.io/windshiftapp/llm:latest
    container_name: windshift-llm
    restart: unless-stopped
    environment:
      - LLM_PORT=8081
      - LLM_CTX_SIZE=4096
      - LLM_THREADS=${LLM_THREADS:-4}
      - LLM_PARALLEL=2
      - LLM_BATCH_SIZE=512
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8081/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 120s
```

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `LLM_PORT` | `8081` | Server port |
| `LLM_CTX_SIZE` | `4096` | Context window size |
| `LLM_THREADS` | `4` | CPU threads for inference |
| `LLM_PARALLEL` | `2` | Maximum concurrent requests |
| `LLM_BATCH_SIZE` | `512` | Batch size for prompt processing |

## Connecting to Windshift

Set `LLM_ENDPOINT` on your main Windshift service and add a `depends_on` with a healthcheck condition:

```yaml
windshift:
  environment:
    - LLM_ENDPOINT=http://llm:8081
  depends_on:
    llm:
      condition: service_healthy
```

See [Environment Variables](/docs/03-configuration/03-environment-variables) for the full configuration reference.

## Resource Requirements

- **RAM:** ~2 GB (1.25 GB model + KV cache + processing overhead)
- **CPU:** Any modern x86_64 or ARM64 processor. More threads = faster inference.
- **GPU:** Not required. The service runs entirely on CPU.
- **Startup:** ~120 seconds while the model loads into memory. The healthcheck `start_period` accounts for this.

Adjust `LLM_THREADS` based on your available CPU cores. The default of 4 threads works well for most deployments.

## Bundled Model & Licensing

The default Docker image ships with **LiquidAI's LFM2.5-1.2B-Instruct** model, licensed under the [LFM Open License v1.0](https://www.liquid.ai/liquid-foundation-models-license).

> **Important:** Organizations with **$10M+ annual revenue** cannot use this model commercially without a separate license from Liquid AI.

If this restriction applies to you, swap the model - the llama.cpp server supports any GGUF-format model. Mount a different `.gguf` file into the container and adjust `LLM_CTX_SIZE` to match:

```yaml
llm:
  image: ghcr.io/windshiftapp/llm:latest
  environment:
    - LLM_CTX_SIZE=8192
  volumes:
    - ./my-model.gguf:/models/model.gguf
```

## External LLM Providers

Instead of running the local inference service, you can configure external LLM providers (OpenAI, Anthropic) through the admin UI. This is useful if you prefer cloud-hosted models or need more capable models for your workload.

Configure providers via the `LLM_PROVIDERS_FILE` environment variable or `--llm-providers` CLI flag. See [Configuration Options](/docs/03-configuration/01-options) for details.

## Checking Status

Verify the AI service is available:

```bash
curl http://localhost:8081/health
```

From within Windshift, the `GET /ai/status` endpoint reports whether AI features are active and which provider is in use.