Jan.ai is a fully open-source desktop application for running AI models locally. Built on top of llama.cpp, it offers a clean ChatGPT-like interface while keeping all your data on your machine. Unlike LM Studio (which is source-available but not fully open), Jan is MIT-licensed and community-driven. It also ships with a built-in model hub, an OpenAI-compatible API server, and experimental multi-engine support.
Why Jan.ai?
Jan stands out in the crowded local LLM space for a few reasons:
- Fully open-source — MIT licensed, with the full source on GitHub at
janhq/jan - Cross-platform — macOS, Windows, and Linux with native GPU support
- Extensions system — Plugin architecture for adding new engines, themes, and tools
- Remote model support — Connect to OpenAI, Anthropic, or Groq alongside local models
- Jan Assistant — Built-in AI assistant with persistent memory and tool use
System Requirements
Jan runs on modest hardware, but more is always better:
| Hardware | Minimum | For good performance |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| GPU VRAM | 4 GB (optional) | 8 GB+ |
| Storage | 5 GB + models | 50 GB+ |
| CPU | Any x64 or ARM | Recent multi-core |
Apple Silicon (M1/M2/M3/M4) users get excellent Metal acceleration. NVIDIA and AMD GPUs are supported on all platforms.
Installation
Download the latest release from jan.ai or the GitHub releases page:
# Linux — download the .deb or .AppImage
# For Debian/Ubuntu:
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-amd64.deb
sudo dpkg -i jan-linux-amd64.deb
# AppImage alternative
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-x86_64.AppImage
chmod +x jan-linux-x86_64.AppImage
./jan-linux-x86_64.AppImage
On Windows, run the .exe installer. On macOS, mount the .dmg and drag Jan to Applications. Apple Silicon and Intel Macs each have separate installers — pick the right one.
Downloading and Running Models
Jan includes a built-in Hub for discovering models. Click the Hub icon in the left sidebar to browse:
Recommended Models in Jan Hub
- Llama 3.2 3B Instruct — Great default for everyday tasks, 2 GB
- Llama 3.1 8B Instruct — Stronger reasoning, 5 GB
- Mistral 7B Instruct — Fast and instruction-tuned, 4 GB
- Gemma 2 9B IT — Google’s instruction model, 5 GB
- DeepSeek-R1 7B — Reasoning and math tasks, 4 GB
- Qwen 2.5 Coder 7B — Excellent for code generation
Click Download next to any model. Jan shows the file size, quantization level, and VRAM estimate before you commit.
Adding Custom GGUF Models
Jan can load any GGUF model file you already have on disk:
- Open Jan and go to Settings > Models
- Click Import Model
- Select your
.gguffile - Jan registers it and makes it available in the model picker
You can also drop GGUF files directly into Jan’s model folder:
- macOS/Linux:
~/jan/models/ - Windows:
C:\Users\<username>\jan\models\
Create a subfolder for each model and include a model.json metadata file, or let Jan auto-detect it.
Using the Chat Interface
Jan’s chat interface is intentionally minimal:
- Click New Thread (pencil icon)
- Select a model from the dropdown at the top
- Type your message and press Enter
The right panel shows model settings you can adjust per-conversation:
- System Prompt — Define the AI’s role and behavior
- Temperature — Response randomness (0–1)
- Max Tokens — Response length limit
- Context Length — How much conversation history the model sees
- Top P / Top K — Sampling parameters
Creating Assistants
Jan’s Assistants feature lets you save custom configurations:
- Go to Assistants in the left sidebar
- Click Create Assistant
- Give it a name, system prompt, and default model
- Save — it appears in your assistant list for quick access
You can build specialized assistants for coding, writing, security research, or any other domain, each with their own system prompt and model settings.
Jan’s Local API Server
Jan runs an OpenAI-compatible API server that your applications can use:
- Go to Settings > Advanced
- Enable Jan API Server
- The server starts on
http://localhost:1337by default
Connecting to Jan’s API
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1337/v1",
api_key="jan-local" # any value
)
response = client.chat.completions.create(
model="llama3.2-3b-instruct",
messages=[
{"role": "user", "content": "What is the difference between TCP and UDP?"}
]
)
print(response.choices[0].message.content)
List available models via the API:
curl http://localhost:1337/v1/models
Connecting Remote AI Providers
Jan is not just for local models. You can configure remote providers alongside your local ones:
- Go to Settings > Model Providers
- Add your OpenAI API key for GPT-4o access
- Add Anthropic key for Claude access
- Add Groq key for fast cloud inference
This lets you switch between a local Llama model for private tasks and GPT-4o for tasks requiring more power — all from the same interface.
Extensions and Customization
Jan’s extension system is one of its differentiating features. Access it via Settings > Extensions:
- TensorRT-LLM Engine — NVIDIA-optimized inference engine for RTX GPU owners (faster than llama.cpp)
- Nitro Engine — Jan’s own optimized inference backend
- Themes — Switch between light and dark themes
The extension API is documented at jan.ai/docs, and the community has built additional engines, tools, and integrations.
File Storage and Privacy
All Jan data lives in ~/jan/ on your system:
~/jan/
├── models/ # Downloaded GGUF files
├── threads/ # Chat history (JSON files)
├── assistants/ # Saved assistant configs
├── extensions/ # Installed extensions
└── settings.json # App configuration
Nothing is sent externally when using local models. Chat history, prompts, and responses are stored in plain JSON in ~/jan/threads/. You can back these up, move them between machines, or delete them freely.
Jan vs. LM Studio vs. Ollama
| Feature | Jan | LM Studio | Ollama |
|---|---|---|---|
| Open source | Yes (MIT) | Partial | Yes (MIT) |
| GUI | Yes | Yes | No (needs add-on) |
| API server | Yes | Yes | Yes |
| Remote providers | Yes | Limited | No |
| Extensions | Yes | No | No |
| Best for | Privacy + flexibility | Beginners | Developers |
Performance Tips
- Enable GPU acceleration — Check Settings > GPU Settings. Jan shows how many layers are GPU-accelerated.
- Reduce context length — If responses are slow, lower the context window from 4096 to 2048 tokens.
- Use smaller quantizations for speed — Q4_K_M is the sweet spot for most users.
- Close unused threads — Each loaded model occupies VRAM. Unload models you’re not using.
- Use the TensorRT extension on NVIDIA — It can be 2–3x faster than the default engine for RTX 3000/4000 series cards.
Jan.ai delivers a complete, privacy-first AI assistant experience with no telemetry, no subscriptions, and no vendor lock-in. For users who want the full ChatGPT experience but entirely on their own hardware, Jan is among the best options available in 2026.