Workflow MCP
Guides

Local LLM Integration

Delegate coding tasks to local models like DeepSeek via Ollama

Local LLM Integration

Development Feature

This feature is currently in development. The documentation describes the planned functionality.

Workflow MCP can delegate mechanical coding tasks to a local LLM (like DeepSeek), preserving Claude's context window for reasoning and complex decisions.

Overview

The Orchestrator/Executor pattern lets Claude focus on what it does best:

Claude (Orchestrator)Local LLM (Executor)
Understands requirementsReceives focused task
Plans approachGenerates code
Evaluates resultsReturns result
Makes judgment callsStateless execution

Why Use Local LLM Delegation?

Context Preservation

Agentic coding loops can consume 50-100K+ tokens. Offloading keeps Claude's window clear for reasoning.

Cost Reduction

Local inference = $0 per token for bulk code generation.

Privacy

Code never leaves your machine during delegated tasks.

Hardware Utilization

Put your GPU to productive use (RTX 3090/4090/5090 recommended).

Requirements

Hardware

GPU VRAMRecommended Model
8-12GBDeepSeek-Coder-6.7B (Q4)
16-24GBDeepSeek-Coder-V2-Lite (16B MoE)
24-32GBDeepSeek-Coder-33B (Q4)
32GB+DeepSeek-R1-Distill-Qwen-32B

Software

  • Ollama installed and running
  • A coding-optimized model pulled (e.g., ollama pull deepseek-coder:33b)

Usage

Enable for a Session

# Enable with default model
/workflow:implement my-feature llm=true

# Enable with specific model
/workflow:implement my-feature llm=deepseek-coder:33b

# Enable without auto-stop (keep model loaded after session)
/workflow:implement my-feature llm=deepseek-coder:33b:persist

MCP Tool: local_llm

ActionDescription
capabilityCheck if local LLM can be enabled (GPU, Ollama, models)
enableStart Ollama, load model, enable for session
disableClear session flag, optionally stop Ollama
statusCheck current status (running, model loaded)
analyzeScore a task for delegation suitability
delegateExecute a coding task on local LLM
statsView delegation success metrics

Example: Delegating a Task

// Check if task is suitable
mcp__local_llm({
  action: "analyze",
  task: "Generate Express CRUD routes for User model",
  files: ["src/types/user.ts"]
})

// Returns: { score: 0.78, recommendation: "delegate", reasons: [...] }

// Execute delegation
mcp__local_llm({
  action: "delegate",
  task: "Generate Express CRUD routes with Joi validation",
  task_type: "generate",
  context_files: ["src/types/user.ts", "src/middleware/error.ts"],
  output_files: [{ path: "src/routes/users.ts", description: "CRUD routes" }],
  verify: "npm run typecheck"
})

Task Suitability

Good for Delegation

  • Generating boilerplate/CRUD code
  • Writing unit tests
  • Adding documentation/docstrings
  • Mechanical refactoring
  • Code translation between languages

Keep with Claude

  • Architecture decisions
  • Security-sensitive code
  • Complex multi-step reasoning
  • Ambiguous requirements needing clarification
  • Tasks requiring deep project context

Task Scoring

The system scores tasks to recommend delegation:

SignalImpactDetection
Pattern-based generation+0.3Keywords: CRUD, boilerplate, scaffold
Single output file+0.2output_files.length === 1
Test generation+0.2Keywords: test, spec
Architecture decision-0.4Keywords: design, architect
Security-related-0.25Keywords: auth, security, encrypt
Multi-file coordination-0.3output_files.length > 2

Tasks scoring above 0.6 are recommended for delegation.

Lifecycle Management

Auto-Start

When llm=true is specified, the system:

  1. Checks if Ollama is running
  2. Starts Ollama if needed
  3. Loads the specified model
  4. Marks session as LLM-enabled

Auto-Stop

By default, when the session ends:

  1. Model is unloaded (returns VRAM)
  2. Ollama continues running (ready for next session)

Use llm=model:persist to keep the model loaded after session end.

Metrics & Feedback

Delegation decisions are logged to .claude/workflow/delegation-log.jsonl:

{
  "timestamp": "2026-01-21T15:30:00Z",
  "task": "Generate CRUD routes",
  "task_type": "generate",
  "score": 0.78,
  "decision": "delegate",
  "verification_passed": true,
  "tokens_generated": 847,
  "time_seconds": 42
}

Use local_llm({ action: 'stats' }) to view aggregated metrics.

On this page