Running Claude Code Fully Local — No API Costs, No Cloud

How to wire up the Claude Code CLI to a local Ollama instance on your home network, with a one-shot .bat installer for Windows.

May 2026 7 min read Windows · Ollama · RTX 4070 Ti

Claude Code is Anthropic’s agentic CLI — it reads your codebase, writes diffs, runs commands, and reasons across multiple files at once. The catch: by default it phones home to Anthropic’s API on every keystroke, which adds up fast. If you’re running a decent GPU at home, there’s no reason to pay for that.

This guide walks through pointing Claude Code at a local Ollama instance instead — completely offline, zero API costs — and wraps the whole thing into a single .bat script you can run once and forget.

Setup at a glance

Claude Code CLI on Windows CMD → Ollama on 192.168.1.10:11434 → RTX 4070 Ti (12 GB VRAM) + 64 GB RAM. No WSL, no Docker, no cloud.

Why this works

Claude Code speaks the Anthropic Messages API format. Ollama (v0.14+) natively understands that same format. So you can redirect Claude Code’s outbound requests to your local Ollama server by setting two environment variables:

cmd

ANTHROPIC_BASE_URL=http://192.168.1.10:11434
ANTHROPIC_API_KEY=ollama   :: required but ignored by Ollama

That’s the entire trick. Everything else — the CLI, the file editing, the reasoning loop — stays identical. The model running under the hood is just different.

Hardware and model selection

With 12 GB of VRAM and 64 GB of system RAM, you have two practical options:

Model	Quantization	VRAM	RAM offload	Speed
qwen2.5-coder:14b	Q4_K_M (~9 GB)	fits fully	none	fast (~30–40 tok/s)
qwen2.5-coder:32b	Q4_K_M (~20 GB)	~12 GB	~8 GB to RAM	moderate (~10–20 tok/s)

The 32B model produces noticeably better multi-file diffs and follows complex instructions more reliably. The offloading to RAM adds latency but is fully functional with 64 GB available. For most homelab coding tasks, the 32B is worth it.

Context window matters

Claude Code needs a large context window to hold your entire codebase. Configure Ollama to use at least 64k tokens, otherwise it will truncate mid-session and produce broken output.

You do this with a custom Modelfile:

powershell

# Create Modelfile
@"
FROM qwen2.5-coder:32b
PARAMETER num_ctx 65536
"@ | Out-File -Encoding utf8 Modelfile

ollama create qwen-coder-64k -f Modelfile

Step-by-step setup

Here is the manual version. If you want the automated script, skip to the next section.

Expose Ollama on your network By default Ollama only listens on localhost. Set OLLAMA_HOST=0.0.0.0:11434 as a Windows system environment variable on the machine running Ollama, then restart the Ollama service.
Pull and configure the model Run ollama pull qwen2.5-coder:32b on the Ollama machine, then create the 64k context variant with the Modelfile above.
Install Claude Code CLI Requires Node.js. From any CMD window: npm install -g @anthropic-ai/claude-code
Set environment variables Use setx to make them persistent across sessions. Also set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to prevent telemetry calls to Anthropic.
Bypass the onboarding screen Claude Code prompts you to log into Anthropic on first run. Skip it by creating %USERPROFILE%\.claude.json with "hasCompletedOnboarding": true.
Pre-warm the model Send a dummy request to Ollama before starting Claude Code. Ollama loads models lazily — without pre-warming, the first Claude Code request will time out while waiting for the model to load into VRAM.

The automated .bat script

Rather than doing all of the above by hand every time you set up a new machine, the steps above are packaged into a single setup-claude-code.bat file. Run it once as Administrator.

What it does, in order:

Pings Ollama on 192.168.1.10:11434 and warns if unreachable
Lists available models and asks which one to use
Installs @anthropic-ai/claude-code via npm
Sets all required environment variables with setx (persistent, user-scope)
Creates or merges %USERPROFILE%\.claude.json with the onboarding bypass
Pre-warms the selected model via a test API call

bat — excerpt

:: Set all vars persistently
setx ANTHROPIC_BASE_URL "http://192.168.1.10:11434"
setx ANTHROPIC_AUTH_TOKEN "ollama"
setx ANTHROPIC_API_KEY "ollama"
setx CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC "1"
setx CLAUDE_CODE_DEFAULT_MODEL "%MODEL%"

:: Onboarding bypass via Node
node -e "const fs=require('fs'); let d={}; try{d=JSON.parse(fs.readFileSync(path))}catch(e){} d.hasCompletedOnboarding=true; fs.writeFileSync(path,JSON.stringify(d,null,2))"

After running the script, restart CMD so the setx variables take effect, then simply type claude from any project folder.

Daily usage

Once set up, usage is identical to the cloud-backed version. Navigate to a project folder in CMD and run:

cmd

cd C:\projects\my-app
claude

Claude Code will index your project, and you can give it instructions like you normally would. The requests go to 192.168.1.10:11434 instead of Anthropic’s servers. No internet required, no usage costs, no rate limits.

Known limitations

Local models are not Claude. Qwen 2.5 Coder 32B is excellent for its size, but it will occasionally struggle with:

Large multi-file refactors — the model may lose track of context across many files simultaneously
Complex architectural decisions — reasoning depth is shallower than Claude Sonnet or Opus
First response latency — even with pre-warming, the first response after a long idle period takes a few seconds while layers reload

For homelab work, personal projects, and routine coding tasks these limitations rarely matter. For large-scale production refactors where you need the best possible result, a hybrid setup — local for exploration, real Claude API for the final pass — is worth considering.

Download

The full setup-claude-code.bat script is available in the GitHub repository linked below. Change the Ollama IP on line 8 if yours differs from 192.168.1.10.

// tested on Windows 11 · Ollama 0.14+ · Claude Code CLI 1.x · qwen2.5-coder:32b Q4_K_M