Local LLM Integration
Delegate coding tasks to local models like DeepSeek via Ollama
Local LLM Integration
This feature is currently in development. The documentation describes the planned functionality.
Workflow MCP can delegate mechanical coding tasks to a local LLM (like DeepSeek), preserving Claude's context window for reasoning and complex decisions.
Overview
The Orchestrator/Executor pattern lets Claude focus on what it does best:
| Claude (Orchestrator) | Local LLM (Executor) |
|---|---|
| Understands requirements | Receives focused task |
| Plans approach | Generates code |
| Evaluates results | Returns result |
| Makes judgment calls | Stateless execution |
Why Use Local LLM Delegation?
Context Preservation
Agentic coding loops can consume 50-100K+ tokens. Offloading keeps Claude's window clear for reasoning.
Cost Reduction
Local inference = $0 per token for bulk code generation.
Privacy
Code never leaves your machine during delegated tasks.
Hardware Utilization
Put your GPU to productive use (RTX 3090/4090/5090 recommended).
Requirements
Hardware
| GPU VRAM | Recommended Model |
|---|---|
| 8-12GB | DeepSeek-Coder-6.7B (Q4) |
| 16-24GB | DeepSeek-Coder-V2-Lite (16B MoE) |
| 24-32GB | DeepSeek-Coder-33B (Q4) |
| 32GB+ | DeepSeek-R1-Distill-Qwen-32B |
Software
- Ollama installed and running
- A coding-optimized model pulled (e.g.,
ollama pull deepseek-coder:33b)
Usage
Enable for a Session
# Enable with default model
/workflow:implement my-feature llm=true
# Enable with specific model
/workflow:implement my-feature llm=deepseek-coder:33b
# Enable without auto-stop (keep model loaded after session)
/workflow:implement my-feature llm=deepseek-coder:33b:persistMCP Tool: local_llm
| Action | Description |
|---|---|
capability | Check if local LLM can be enabled (GPU, Ollama, models) |
enable | Start Ollama, load model, enable for session |
disable | Clear session flag, optionally stop Ollama |
status | Check current status (running, model loaded) |
analyze | Score a task for delegation suitability |
delegate | Execute a coding task on local LLM |
stats | View delegation success metrics |
Example: Delegating a Task
// Check if task is suitable
mcp__local_llm({
action: "analyze",
task: "Generate Express CRUD routes for User model",
files: ["src/types/user.ts"]
})
// Returns: { score: 0.78, recommendation: "delegate", reasons: [...] }
// Execute delegation
mcp__local_llm({
action: "delegate",
task: "Generate Express CRUD routes with Joi validation",
task_type: "generate",
context_files: ["src/types/user.ts", "src/middleware/error.ts"],
output_files: [{ path: "src/routes/users.ts", description: "CRUD routes" }],
verify: "npm run typecheck"
})Task Suitability
Good for Delegation
- Generating boilerplate/CRUD code
- Writing unit tests
- Adding documentation/docstrings
- Mechanical refactoring
- Code translation between languages
Keep with Claude
- Architecture decisions
- Security-sensitive code
- Complex multi-step reasoning
- Ambiguous requirements needing clarification
- Tasks requiring deep project context
Task Scoring
The system scores tasks to recommend delegation:
| Signal | Impact | Detection |
|---|---|---|
| Pattern-based generation | +0.3 | Keywords: CRUD, boilerplate, scaffold |
| Single output file | +0.2 | output_files.length === 1 |
| Test generation | +0.2 | Keywords: test, spec |
| Architecture decision | -0.4 | Keywords: design, architect |
| Security-related | -0.25 | Keywords: auth, security, encrypt |
| Multi-file coordination | -0.3 | output_files.length > 2 |
Tasks scoring above 0.6 are recommended for delegation.
Lifecycle Management
Auto-Start
When llm=true is specified, the system:
- Checks if Ollama is running
- Starts Ollama if needed
- Loads the specified model
- Marks session as LLM-enabled
Auto-Stop
By default, when the session ends:
- Model is unloaded (returns VRAM)
- Ollama continues running (ready for next session)
Use llm=model:persist to keep the model loaded after session end.
Metrics & Feedback
Delegation decisions are logged to .claude/workflow/delegation-log.jsonl:
{
"timestamp": "2026-01-21T15:30:00Z",
"task": "Generate CRUD routes",
"task_type": "generate",
"score": 0.78,
"decision": "delegate",
"verification_passed": true,
"tokens_generated": 847,
"time_seconds": 42
}Use local_llm({ action: 'stats' }) to view aggregated metrics.