Running Claude Code Fully Local — No API Costs, No Cloud
How to wire up the Claude Code CLI to a local Ollama instance on your home network, with a one-shot .bat installer for Windows.
Claude Code is Anthropic’s agentic CLI — it reads your codebase, writes diffs, runs commands, and reasons across multiple files at once. The catch: by default it phones home to Anthropic’s API on every keystroke, which adds up fast. If you’re running a decent GPU at home, there’s no reason to pay for that.
This guide walks through pointing Claude Code at a local Ollama instance instead — completely offline, zero API costs — and wraps the whole thing into a single .bat script you can run once and forget.
Claude Code CLI on Windows CMD → Ollama on 192.168.1.10:11434 → RTX 4070 Ti (12 GB VRAM) + 64 GB RAM. No WSL, no Docker, no cloud.
Why this works
Claude Code speaks the Anthropic Messages API format. Ollama (v0.14+) natively understands that same format. So you can redirect Claude Code’s outbound requests to your local Ollama server by setting two environment variables:
ANTHROPIC_BASE_URL=http://192.168.1.10:11434 ANTHROPIC_API_KEY=ollama :: required but ignored by Ollama
That’s the entire trick. Everything else — the CLI, the file editing, the reasoning loop — stays identical. The model running under the hood is just different.
Hardware and model selection
With 12 GB of VRAM and 64 GB of system RAM, you have two practical options:
| Model | Quantization | VRAM | RAM offload | Speed |
|---|---|---|---|---|
| qwen2.5-coder:14b | Q4_K_M (~9 GB) | fits fully | none | fast (~30–40 tok/s) |
| qwen2.5-coder:32b | Q4_K_M (~20 GB) | ~12 GB | ~8 GB to RAM | moderate (~10–20 tok/s) |
The 32B model produces noticeably better multi-file diffs and follows complex instructions more reliably. The offloading to RAM adds latency but is fully functional with 64 GB available. For most homelab coding tasks, the 32B is worth it.
Claude Code needs a large context window to hold your entire codebase. Configure Ollama to use at least 64k tokens, otherwise it will truncate mid-session and produce broken output.
You do this with a custom Modelfile:
# Create Modelfile @" FROM qwen2.5-coder:32b PARAMETER num_ctx 65536 "@ | Out-File -Encoding utf8 Modelfile ollama create qwen-coder-64k -f Modelfile
Step-by-step setup
Here is the manual version. If you want the automated script, skip to the next section.
-
Expose Ollama on your network By default Ollama only listens on
localhost. SetOLLAMA_HOST=0.0.0.0:11434as a Windows system environment variable on the machine running Ollama, then restart the Ollama service. -
Pull and configure the model Run
ollama pull qwen2.5-coder:32bon the Ollama machine, then create the 64k context variant with the Modelfile above. -
Install Claude Code CLI Requires Node.js. From any CMD window:
npm install -g @anthropic-ai/claude-code -
Set environment variables Use
setxto make them persistent across sessions. Also setCLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1to prevent telemetry calls to Anthropic. -
Bypass the onboarding screen Claude Code prompts you to log into Anthropic on first run. Skip it by creating
%USERPROFILE%\.claude.jsonwith"hasCompletedOnboarding": true. -
Pre-warm the model Send a dummy request to Ollama before starting Claude Code. Ollama loads models lazily — without pre-warming, the first Claude Code request will time out while waiting for the model to load into VRAM.
The automated .bat script
Rather than doing all of the above by hand every time you set up a new machine, the steps above are packaged into a single setup-claude-code.bat file. Run it once as Administrator.
What it does, in order:
- Pings Ollama on
192.168.1.10:11434and warns if unreachable - Lists available models and asks which one to use
- Installs
@anthropic-ai/claude-codevia npm - Sets all required environment variables with
setx(persistent, user-scope) - Creates or merges
%USERPROFILE%\.claude.jsonwith the onboarding bypass - Pre-warms the selected model via a test API call
:: Set all vars persistently setx ANTHROPIC_BASE_URL "http://192.168.1.10:11434" setx ANTHROPIC_AUTH_TOKEN "ollama" setx ANTHROPIC_API_KEY "ollama" setx CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC "1" setx CLAUDE_CODE_DEFAULT_MODEL "%MODEL%" :: Onboarding bypass via Node node -e "const fs=require('fs'); let d={}; try{d=JSON.parse(fs.readFileSync(path))}catch(e){} d.hasCompletedOnboarding=true; fs.writeFileSync(path,JSON.stringify(d,null,2))"
After running the script, restart CMD so the setx variables take effect, then simply type claude from any project folder.
Daily usage
Once set up, usage is identical to the cloud-backed version. Navigate to a project folder in CMD and run:
cd C:\projects\my-app claude
Claude Code will index your project, and you can give it instructions like you normally would. The requests go to 192.168.1.10:11434 instead of Anthropic’s servers. No internet required, no usage costs, no rate limits.
Known limitations
Local models are not Claude. Qwen 2.5 Coder 32B is excellent for its size, but it will occasionally struggle with:
- Large multi-file refactors — the model may lose track of context across many files simultaneously
- Complex architectural decisions — reasoning depth is shallower than Claude Sonnet or Opus
- First response latency — even with pre-warming, the first response after a long idle period takes a few seconds while layers reload
For homelab work, personal projects, and routine coding tasks these limitations rarely matter. For large-scale production refactors where you need the best possible result, a hybrid setup — local for exploration, real Claude API for the final pass — is worth considering.
The full setup-claude-code.bat script is available in the GitHub repository linked below. Change the Ollama IP on line 8 if yours differs from 192.168.1.10.
// tested on Windows 11 · Ollama 0.14+ · Claude Code CLI 1.x · qwen2.5-coder:32b Q4_K_M