Tutorial

Running LLMs Locally: The Complete Developer Guide

Ollama, llama.cpp, and vLLM. Everything you need to run powerful language models on your own hardware for development and testing.

Leanne ThuongJan 7, 202614 min read

Running LLMs locally gives you privacy, zero API costs, and offline access. Here's how to set it up properly.

Why Run Local?

No API keys, no rate limits, no data leaving your machine. Perfect for development, testing, and sensitive codebases.

Ollama is the easiest way to get started. Install it, pull a model, and you're running in minutes.

7B models: 8GB RAM minimum

13B models: 16GB RAM

70B models: 64GB RAM or GPU with 48GB VRAM

1. DeepSeek Coder V3 (33B) -- best overall

2. CodeLlama (34B) -- great for completions

3. Qwen2.5 Coder (32B) -- excellent instruction following

You can point Cursor at your local Ollama instance for completely private AI-assisted coding.