Context Preservation
Agentic coding loops can consume 50-100K+ tokens. Offloading keeps Claude’s window clear for reasoning.
Workflow MCP can delegate mechanical coding tasks to a local LLM (like DeepSeek), preserving Claude’s context window for reasoning and complex decisions.
The Orchestrator/Executor pattern lets Claude focus on what it does best:
| Claude (Orchestrator) | Local LLM (Executor) |
|---|---|
| Understands requirements | Receives focused task |
| Plans approach | Generates code |
| Evaluates results | Returns result |
| Makes judgment calls | Stateless execution |
Context Preservation
Agentic coding loops can consume 50-100K+ tokens. Offloading keeps Claude’s window clear for reasoning.
Cost Reduction
Local inference = $0 per token for bulk code generation.
Privacy
Code never leaves your machine during delegated tasks.
Hardware Utilization
Put your GPU to productive use (RTX 3090/4090/5090 recommended).
| GPU VRAM | Recommended Model |
|---|---|
| 8-12GB | DeepSeek-Coder-6.7B (Q4) |
| 16-24GB | DeepSeek-Coder-V2-Lite (16B MoE) |
| 24-32GB | DeepSeek-Coder-33B (Q4) |
| 32GB+ | DeepSeek-R1-Distill-Qwen-32B |
ollama pull deepseek-coder:33b)# Enable with default model/workflow:implement my-feature llm=true
# Enable with specific model/workflow:implement my-feature llm=deepseek-coder:33b
# Enable without auto-stop (keep model loaded after session)/workflow:implement my-feature llm=deepseek-coder:33b:persistlocal_llm| Action | Description |
|---|---|
capability | Check if local LLM can be enabled (GPU, Ollama, models) |
enable | Start Ollama, load model, enable for session |
disable | Clear session flag, optionally stop Ollama |
status | Check current status (running, model loaded) |
analyze | Score a task for delegation suitability |
delegate | Execute a coding task on local LLM |
stats | View delegation success metrics |
// Check if task is suitablemcp__local_llm({ action: "analyze", task: "Generate Express CRUD routes for User model", files: ["src/types/user.ts"]})
// Returns: { score: 0.78, recommendation: "delegate", reasons: [...] }
// Execute delegationmcp__local_llm({ action: "delegate", task: "Generate Express CRUD routes with Joi validation", task_type: "generate", context_files: ["src/types/user.ts", "src/middleware/error.ts"], output_files: [{ path: "src/routes/users.ts", description: "CRUD routes" }], verify: "npm run typecheck"})The system scores tasks to recommend delegation:
| Signal | Impact | Detection |
|---|---|---|
| Pattern-based generation | +0.3 | Keywords: CRUD, boilerplate, scaffold |
| Single output file | +0.2 | output_files.length === 1 |
| Test generation | +0.2 | Keywords: test, spec |
| Architecture decision | -0.4 | Keywords: design, architect |
| Security-related | -0.25 | Keywords: auth, security, encrypt |
| Multi-file coordination | -0.3 | output_files.length > 2 |
Tasks scoring above 0.6 are recommended for delegation.
When llm=true is specified, the system:
By default, when the session ends:
Use llm=model:persist to keep the model loaded after session end.
Delegation decisions are logged to .claude/workflow/delegation-log.jsonl:
{ "timestamp": "2026-01-21T15:30:00Z", "task": "Generate CRUD routes", "task_type": "generate", "score": 0.78, "decision": "delegate", "verification_passed": true, "tokens_generated": 847, "time_seconds": 42}Use local_llm({ action: 'stats' }) to view aggregated metrics.