Back to articlesTutorial 7B models: 8GB RAM minimum 13B models: 16GB RAM 70B models: 64GB RAM or GPU with 48GB VRAM
Running LLMs Locally: The Complete Developer Guide
Ollama, llama.cpp, and vLLM. Everything you need to run powerful language models on your own hardware for development and testing.
Leanne ThuongJan 7, 202614 min read
Running LLMs locally gives you privacy, zero API costs, and offline access. Here's how to set it up properly.
Why Run Local?
No API keys, no rate limits, no data leaving your machine. Perfect for development, testing, and sensitive codebases.
Ollama Setup
Ollama is the easiest way to get started. Install it, pull a model, and you're running in minutes.
Hardware Requirements
Best Local Models for Coding
1. DeepSeek Coder V3 (33B) -- best overall
2. CodeLlama (34B) -- great for completions
3. Qwen2.5 Coder (32B) -- excellent instruction following
Integration with Cursor
You can point Cursor at your local Ollama instance for completely private AI-assisted coding.