LM Studio is the most polished graphical interface for running large language models on your own hardware. Where Ollama gives you a CLI, LM Studio gives you a full desktop app with a model browser, chat interface, and a built-in OpenAI-compatible API server. If you want to run Llama, Mistral, Phi, or DeepSeek locally without touching a terminal, LM Studio is the answer.
What LM Studio Offers
LM Studio handles the entire local LLM workflow in a single application:
- Model discovery — Browse and download models directly from Hugging Face inside the app
- Chat interface — A clean chat UI similar to ChatGPT, with system prompt controls
- Local API server — An OpenAI-compatible endpoint your apps can connect to
- Hardware detection — Auto-configures GPU layers for NVIDIA, AMD, and Apple Silicon
- Model comparison — Run two models side-by-side
LM Studio is free for personal use and runs on macOS, Windows, and Linux.
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| VRAM (GPU) | 4 GB | 8 GB+ |
| Storage | 10 GB free | 50 GB+ |
| OS | Windows 10, macOS 12, Ubuntu 22.04 | Latest versions |
Apple Silicon Macs (M1 and later) are exceptionally well-supported — the Metal backend gives you fast inference even on unified memory.
Installation
Download the installer from lmstudio.ai. There are no dependencies to install manually — the package bundles everything including CUDA libraries for NVIDIA users.
- Windows: Run the
.exeinstaller, launch from Start menu - macOS: Mount the
.dmg, drag to Applications - Linux: Download the
.AppImage, make executable, and run
# Linux AppImage setup
chmod +x LM_Studio-0.3.x.AppImage
./LM_Studio-0.3.x.AppImage
Downloading Models
LM Studio has a built-in model search powered by Hugging Face. Click the Discover tab (magnifying glass icon) and search for a model.
Recommended Models for 2026
For everyday chat:
lmstudio-community/Meta-Llama-3.2-3B-Instruct-GGUF— Fast, small, 2 GBbartowski/mistral-7b-instruct-v0.3-GGUF— Well-rounded, 4 GB
For coding:
lmstudio-community/Qwen2.5-Coder-7B-Instruct-GGUF— Excellent code modelTheBloke/CodeLlama-13B-Instruct-GGUF— Strong for larger codebases
For reasoning:
lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF— Chain-of-thought reasoningbartowski/Phi-4-GGUF— Microsoft’s efficient reasoning model
Choosing the Right Quantization
When you click a model, LM Studio shows multiple quantization options. The file size column tells you the VRAM requirement:
| Quantization | Quality | Size (7B model) |
|---|---|---|
| Q2_K | Low | ~2.8 GB |
| Q4_K_M | Good | ~4.1 GB |
| Q5_K_M | Better | ~4.8 GB |
| Q6_K | High | ~5.5 GB |
| Q8_0 | Near-lossless | ~7.2 GB |
Q4_K_M is the recommended default — it fits in 4–8 GB VRAM while preserving most of the model’s quality.
Chatting with a Model
- Click the Chat tab (speech bubble icon)
- Click Select a model to load at the top
- Choose your downloaded model from the list
- LM Studio loads the model and shows GPU layer allocation
The System Prompt field at the top of the chat lets you define the AI’s persona and behavior:
You are a helpful programming assistant specializing in Python and JavaScript.
Answer concisely with working code examples. Always explain what the code does.
Chat Settings Panel
The right panel exposes key inference parameters:
- Temperature — Creativity (0.0 = deterministic, 1.0 = creative). Set 0.3–0.5 for factual tasks.
- Context Length — How many tokens of history the model can see. Higher uses more VRAM.
- Max Tokens — Maximum response length.
- Top P / Top K — Sampling controls; defaults are usually fine.
- Repeat Penalty — Prevents repetitive output. 1.1 is a good default.
Using the Local API Server
LM Studio’s most powerful feature is its OpenAI-compatible API server. Any tool built for the OpenAI API can point at LM Studio instead.
Starting the Server
- Click the Local Server tab (server icon in the left sidebar)
- Select a model to load for the server
- Click Start Server
The server runs on http://localhost:1234 by default. You can change the port in the settings.
API Usage
from openai import OpenAI
# Point the OpenAI client at LM Studio
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio" # any string works
)
response = client.chat.completions.create(
model="lmstudio-community/Meta-Llama-3.2-3B-Instruct-GGUF",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain TCP/IP in simple terms."}
],
temperature=0.7
)
print(response.choices[0].message.content)
Embeddings API
LM Studio also supports the embeddings endpoint, useful for RAG applications:
response = client.embeddings.create(
model="nomic-ai/nomic-embed-text-v1.5-GGUF",
input="The quick brown fox"
)
embedding = response.data[0].embedding
GPU Configuration
LM Studio shows how many model layers are offloaded to your GPU. More GPU layers = faster inference. If you’re running out of VRAM, reduce the GPU Layers slider under Model Configuration.
For NVIDIA users: LM Studio uses llama.cpp under the hood with CUDA. Make sure your NVIDIA drivers are up to date (driver version 520+ for CUDA 12).
For AMD users (Linux): LM Studio uses the ROCm/Vulkan backend. Performance is good on RX 6000 series and newer.
For Apple Silicon: LM Studio uses Metal acceleration automatically. Unified memory means an M2 Pro with 32 GB RAM can run 13B models at full GPU speed.
Model Management Tips
- Model folder: LM Studio stores models in
~/LM Studio/models/by default. You can change this in Settings > Storage. - Sharing with Ollama: Point both tools at the same folder to avoid downloading models twice (they both use GGUF format).
- Duplicate detection: LM Studio warns you if you already have a model before downloading.
Integrating with Other Tools
Because LM Studio exposes an OpenAI-compatible API, it works out of the box with:
- Continue.dev — AI code assistant for VS Code and JetBrains. Set the base URL to
http://localhost:1234/v1. - LangChain — Use
ChatOpenAI(base_url="http://localhost:1234/v1", api_key="x"). - Obsidian Copilot plugin — Set the OpenAI endpoint to your LM Studio server.
- Cursor IDE — Can proxy through LM Studio with custom API endpoint settings.
LM Studio vs. Ollama
| Feature | LM Studio | Ollama |
|---|---|---|
| GUI | Yes | No (needs Open WebUI) |
| CLI | No | Yes |
| Model browser | Built-in | Via ollama.com |
| API server | Yes | Yes |
| Multimodal | Yes | Yes |
| Linux | Yes (AppImage) | Yes (native) |
| Best for | Beginners, GUI users | Developers, scripting |
Both tools use llama.cpp under the hood and support GGUF models, so you can share model files between them.
LM Studio removes every barrier to running local AI. Download the app, search for a model, and you’re chatting with a local LLM in under ten minutes — no command line required.