AI Tools #deepseek#deepseek-r1#ollama

Running DeepSeek R1 Locally with Ollama and LM Studio

Set up DeepSeek R1 locally using Ollama and LM Studio. Covers all model sizes from 1.5B to 671B, distilled variants, hardware requirements, and use cases.

7 min read

DeepSeek R1 arrived in early 2025 and made waves by matching or exceeding OpenAI’s o1 on reasoning benchmarks — at a fraction of the training cost and with open weights. Running it locally is straightforward, and the distilled variants bring serious reasoning capability to consumer hardware.

The DeepSeek R1 Model Family

DeepSeek released the R1 series in multiple configurations:

ModelParametersVRAM Required (Q4)Notes
DeepSeek-R1-Distill-Qwen-1.5B1.5B~2GBRuns on integrated graphics
DeepSeek-R1-Distill-Qwen-7B7B~5GBGood quality, any modern GPU
DeepSeek-R1-Distill-Llama-8B8B~6GBLlama 3 architecture base
DeepSeek-R1-Distill-Qwen-14B14B~10GBBest distill for 12GB VRAM
DeepSeek-R1-Distill-Qwen-32B32B~22GBHigh quality, 24GB VRAM
DeepSeek-R1-Distill-Llama-70B70B~45GBNear-full R1 quality
DeepSeek-R1 (full)671B~400GB+Requires data center hardware

Distilled models are trained using DeepSeek R1’s reasoning traces as training data for smaller architectures. The R1-Distill-Qwen-14B in particular punches far above its weight — delivering reasoning quality that would have required a 70B model just a year earlier.

R1 vs R1-Zero

  • DeepSeek-R1-Zero: trained purely with reinforcement learning from scratch, no supervised fine-tuning. It developed reasoning spontaneously. Quirky output formatting.
  • DeepSeek-R1: R1-Zero plus additional supervised fine-tuning on human-curated data. Cleaner outputs, better instruction following.

For practical use, always prefer R1 over R1-Zero. R1-Zero is primarily interesting for research into emergent reasoning.

Running with Ollama

Ollama is the easiest path to local DeepSeek R1. Install from ollama.com if you haven’t:

# macOS/Linux install
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download installer from ollama.com

Pull and run DeepSeek R1 distilled models:

# 8B model — good for most hardware (6GB+ VRAM)
ollama pull deepseek-r1:8b
ollama run deepseek-r1:8b

# 14B model — excellent quality/size ratio
ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b

# 32B model — for 24GB VRAM GPUs
ollama pull deepseek-r1:32b
ollama run deepseek-r1:32b

# 70B model — requires 48GB+ VRAM or multi-GPU
ollama pull deepseek-r1:70b

Ollama automatically uses GPU acceleration if available and falls back to CPU inference.

Querying via Ollama API

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:14b",
  "messages": [
    {
      "role": "user",
      "content": "Solve this step by step: A train leaves Chicago at 60mph. Another leaves New York at 80mph. Cities are 800 miles apart. When do they meet?"
    }
  ]
}'
import ollama

response = ollama.chat(
    model='deepseek-r1:14b',
    messages=[{
        'role': 'user',
        'content': 'Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes'
    }]
)
print(response['message']['content'])

Understanding the Reasoning Chain Output

DeepSeek R1’s signature feature is its chain-of-thought reasoning, enclosed in <think> tags. The model reasons step-by-step before giving its final answer:

<think>
Let me work through this problem systematically.

The trains are traveling toward each other, so their speeds add: 60 + 80 = 140 mph combined closing speed.
Distance = 800 miles.
Time = Distance / Speed = 800 / 140 = 5.71 hours.
That's 5 hours and approximately 43 minutes.

Let me verify: in 5.71 hours, train 1 travels 60 × 5.71 = 342.6 miles. Train 2 travels 80 × 5.71 = 456.8 miles. Total: 342.6 + 456.8 = 799.4 ≈ 800 miles. ✓
</think>

The trains will meet approximately **5 hours and 43 minutes** after departing.

This reasoning chain is what makes R1 excel at math, coding, and logic. The model literally shows its work.

Some tools (like Open WebUI) can collapse the <think> block by default so you only see the final answer, with an option to expand the reasoning.

Running in LM Studio

LM Studio provides a GUI for running GGUF models locally:

  1. Download LM Studio from lmstudio.ai
  2. Open the Discover tab
  3. Search for deepseek-r1
  4. Select your preferred quantization:
    • Q4_K_M — best balance of quality and VRAM
    • Q5_K_M — higher quality, more VRAM
    • Q8_0 — near-lossless, double the VRAM
  5. Click Download and wait for the model to download
  6. Switch to the Chat tab and load the model

LM Studio also provides a local server with OpenAI-compatible endpoints at http://localhost:1234/v1.

Manual GGUF Download

For fine-grained control over quantization, download directly from Hugging Face:

# bartowski's quantizations are community-recommended
huggingface-cli download bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF \
  DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf \
  --local-dir ./models

# Run with llama.cpp
./llama-cli \
  -m ./models/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf \
  -ngl 999 \
  -p "<|im_start|>user\nExplain the P vs NP problem<|im_end|>\n<|im_start|>assistant\n" \
  -n 1000

Hardware Requirements Per Model Size

ModelMinimum VRAMRecommended SetupTokens/Sec (approx)
R1-Distill 1.5B2GBAny GPU60-100 tok/s
R1-Distill 7B6GBRTX 3060 12GB30-60 tok/s
R1-Distill 8B6GBRTX 3060 12GB25-50 tok/s
R1-Distill 14B10GBRTX 4070 12GB20-40 tok/s
R1-Distill 32B22GBRTX 3090/409010-25 tok/s
R1-Distill 70B45GB2x RTX 40905-12 tok/s

Best Use Cases for DeepSeek R1

DeepSeek R1’s reasoning-first training makes it exceptional at:

Coding tasks:

  • Debugging complex logic errors
  • Writing algorithms with correctness requirements
  • Code review and refactoring suggestions

Mathematics:

  • Step-by-step problem solving
  • Proof verification
  • Statistical analysis and formula derivation

Logic and reasoning:

  • Multi-step deduction problems
  • Constraint satisfaction
  • Argument analysis and critique

Compared to Llama 3 and Mistral, R1 distills consistently outperform on MATH, HumanEval, and GPQA benchmarks at equivalent parameter counts. For creative writing or casual chat, the difference is minimal. For anything requiring careful reasoning, R1 is the better choice.

#reasoning-model #local-llm #lm-studio #ollama #deepseek-r1 #deepseek