Back to articles
Tutorial

Running LLMs Locally: The Complete Developer Guide

Ollama, llama.cpp, and vLLM. Everything you need to run powerful language models on your own hardware for development and testing.

Leanne ThuongJan 7, 202614 min read

Running LLMs locally gives you privacy, zero API costs, and offline access. Here's how to set it up properly.

Why Run Local?

No API keys, no rate limits, no data leaving your machine. Perfect for development, testing, and sensitive codebases.

Ollama Setup

Ollama is the easiest way to get started. Install it, pull a model, and you're running in minutes.

Hardware Requirements

  • 7B models: 8GB RAM minimum
  • 13B models: 16GB RAM
  • 70B models: 64GB RAM or GPU with 48GB VRAM
  • Best Local Models for Coding

    1. DeepSeek Coder V3 (33B) -- best overall

    2. CodeLlama (34B) -- great for completions

    3. Qwen2.5 Coder (32B) -- excellent instruction following

    Integration with Cursor

    You can point Cursor at your local Ollama instance for completely private AI-assisted coding.