Jan (jan.ai) is an open-source desktop application for running large language models locally on your PC — no internet required, no API costs, complete privacy. It’s simpler than Ollama for non-technical users while still giving you model downloads, conversation management, and OpenAI-compatible API access.

Why Run LLMs Locally?

Privacy: your conversations never leave your machine
No cost: no API fees or subscription after the model downloads
Offline: works without internet
No censorship: models run at their own capability level without service-level restrictions
Customization: run fine-tuned models for specific tasks

System Requirements

Jan runs on Windows 10/11, macOS (Apple Silicon or Intel), and Linux.

Minimum (CPU-only):

8GB RAM — for 3B–7B parameter models
16GB RAM — comfortable for 7B models
Any modern CPU

Recommended (GPU acceleration):

NVIDIA GPU with 6GB+ VRAM — runs 7B models smoothly
12GB+ VRAM — runs 13B models comfortably
NVIDIA RTX 3080 / RX 7800 XT or better for real-time responses

Apple Silicon (M1/M2/M3/M4):

Unified memory architecture — exceptional performance
M1 Pro/Max/Ultra with 16GB+ handles 7B–13B easily

Installation

Download from jan.ai — available for Windows (.exe), macOS (.dmg), Linux (.AppImage)
Install and launch
Jan opens with a chat interface and model browser

No command line required.

Downloading Models

Jan’s built-in Explore Models section offers curated, pre-quantized models ready to download:

Recommended models by VRAM:

VRAM	Recommended Model	Notes
6GB	Llama 3.2 3B Q8	Fast, capable
8GB	Mistral 7B Q6_K	Strong general performance
12GB	Llama 3.1 8B Q8	Best quality at this size
16GB	Phi-4 14B Q4	Microsoft’s efficient model
24GB	Llama 3.3 70B Q3	Near-frontier performance

Quantization suffixes:

Q8: 8-bit — best quality, larger file
Q6_K, Q5_K_M: good balance of quality and size
Q4_K_M: good for VRAM-constrained systems
Q2_K: most compressed, noticeably lower quality

Configuration

After downloading a model, click Settings → Model Settings:

Context length: how many tokens the model remembers (4K–32K depending on model). Larger context uses more VRAM.
Temperature: 0.1 = focused/deterministic, 1.0 = creative/random
GPU layers: Jan automatically sets this. If you want more VRAM use, increase. Set to 0 for CPU-only.
System prompt: set a persistent persona or instruction for all conversations

Jan’s Key Features

OpenAI-Compatible API

Jan runs a local API server at http://localhost:1337 that’s compatible with OpenAI’s API format:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1337/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.1-8b-instruct",
    messages=[{"role": "user", "content": "Explain quantum computing simply"}]
)
print(response.choices[0].message.content)

This means any app built for OpenAI can point to Jan instead — completely local and free.

Conversation Management

Jan saves all conversations locally in JSON format (~/jan/conversations/). You can:

Browse past conversations
Search conversation history
Export conversations

Extensions

Jan supports extensions for additional functionality:

Remote model providers: connect Claude, Gemini, or GPT-4 as alternatives to local models
Retrieval-Augmented Generation (RAG): talk to your documents (PDF, text files)

Jan vs. Ollama vs. LM Studio

Feature	Jan	Ollama	LM Studio
GUI	Yes	No (CLI only)	Yes
Model browser	Yes	Yes (via CLI)	Yes
API server	Yes	Yes	Yes
RAG/documents	Extension	No built-in	Yes
Ease of use	Easiest	Moderate	Easy
Open source	Yes	Yes	No
Cross-platform	Yes	Yes	Yes

Jan and LM Studio are the two most user-friendly options. Jan is fully open-source; LM Studio is free but closed-source with more advanced GPU layer controls.

Getting the Best Performance

Use GPU layers: ensure Jan detects your GPU under Settings → GPU
Right-size your model: don’t try to run a 7B model on 4GB VRAM — choose based on your hardware
Close other apps: free up VRAM before loading large models
Enable Flash Attention: in model settings, if your GPU supports it
Restart Jan between model switches: clears VRAM properly

Example Use Cases

Offline coding assistant: set system prompt to “You are a helpful Python developer”
Private document analysis: upload sensitive docs to Jan’s RAG feature without cloud exposure
Writing assistant: no API costs for long-form content generation
Learning tool: ask questions about sensitive security topics without API monitoring

Jan is the lowest-friction entry point to local LLM use — download, pick a model, and start chatting in under 10 minutes.