Skip to content

Local LLM Integration

Workflow MCP can delegate mechanical coding tasks to a local LLM (like DeepSeek), preserving Claude’s context window for reasoning and complex decisions.

The Orchestrator/Executor pattern lets Claude focus on what it does best:

Claude (Orchestrator)Local LLM (Executor)
Understands requirementsReceives focused task
Plans approachGenerates code
Evaluates resultsReturns result
Makes judgment callsStateless execution

Context Preservation

Agentic coding loops can consume 50-100K+ tokens. Offloading keeps Claude’s window clear for reasoning.

Cost Reduction

Local inference = $0 per token for bulk code generation.

Privacy

Code never leaves your machine during delegated tasks.

Hardware Utilization

Put your GPU to productive use (RTX 3090/4090/5090 recommended).

GPU VRAMRecommended Model
8-12GBDeepSeek-Coder-6.7B (Q4)
16-24GBDeepSeek-Coder-V2-Lite (16B MoE)
24-32GBDeepSeek-Coder-33B (Q4)
32GB+DeepSeek-R1-Distill-Qwen-32B
  • Ollama installed and running
  • A coding-optimized model pulled (e.g., ollama pull deepseek-coder:33b)
Terminal window
# Enable with default model
/workflow:implement my-feature llm=true
# Enable with specific model
/workflow:implement my-feature llm=deepseek-coder:33b
# Enable without auto-stop (keep model loaded after session)
/workflow:implement my-feature llm=deepseek-coder:33b:persist
ActionDescription
capabilityCheck if local LLM can be enabled (GPU, Ollama, models)
enableStart Ollama, load model, enable for session
disableClear session flag, optionally stop Ollama
statusCheck current status (running, model loaded)
analyzeScore a task for delegation suitability
delegateExecute a coding task on local LLM
statsView delegation success metrics
// Check if task is suitable
mcp__local_llm({
action: "analyze",
task: "Generate Express CRUD routes for User model",
files: ["src/types/user.ts"]
})
// Returns: { score: 0.78, recommendation: "delegate", reasons: [...] }
// Execute delegation
mcp__local_llm({
action: "delegate",
task: "Generate Express CRUD routes with Joi validation",
task_type: "generate",
context_files: ["src/types/user.ts", "src/middleware/error.ts"],
output_files: [{ path: "src/routes/users.ts", description: "CRUD routes" }],
verify: "npm run typecheck"
})
  • Generating boilerplate/CRUD code
  • Writing unit tests
  • Adding documentation/docstrings
  • Mechanical refactoring
  • Code translation between languages
  • Architecture decisions
  • Security-sensitive code
  • Complex multi-step reasoning
  • Ambiguous requirements needing clarification
  • Tasks requiring deep project context

The system scores tasks to recommend delegation:

SignalImpactDetection
Pattern-based generation+0.3Keywords: CRUD, boilerplate, scaffold
Single output file+0.2output_files.length === 1
Test generation+0.2Keywords: test, spec
Architecture decision-0.4Keywords: design, architect
Security-related-0.25Keywords: auth, security, encrypt
Multi-file coordination-0.3output_files.length > 2

Tasks scoring above 0.6 are recommended for delegation.

When llm=true is specified, the system:

  1. Checks if Ollama is running
  2. Starts Ollama if needed
  3. Loads the specified model
  4. Marks session as LLM-enabled

By default, when the session ends:

  1. Model is unloaded (returns VRAM)
  2. Ollama continues running (ready for next session)

Use llm=model:persist to keep the model loaded after session end.

Delegation decisions are logged to .claude/workflow/delegation-log.jsonl:

{
"timestamp": "2026-01-21T15:30:00Z",
"task": "Generate CRUD routes",
"task_type": "generate",
"score": 0.78,
"decision": "delegate",
"verification_passed": true,
"tokens_generated": 847,
"time_seconds": 42
}

Use local_llm({ action: 'stats' }) to view aggregated metrics.