AI Tools #lm-studio#local-llm#ai

LM Studio 2026 Guide: Run Local LLMs with a GUI

LM Studio lets you download and run local LLMs with a polished GUI. This 2026 guide covers setup, model management, and the built-in API server.

7 min read

LM Studio is the most polished graphical interface for running large language models on your own hardware. Where Ollama gives you a CLI, LM Studio gives you a full desktop app with a model browser, chat interface, and a built-in OpenAI-compatible API server. If you want to run Llama, Mistral, Phi, or DeepSeek locally without touching a terminal, LM Studio is the answer.

What LM Studio Offers

LM Studio handles the entire local LLM workflow in a single application:

  • Model discovery — Browse and download models directly from Hugging Face inside the app
  • Chat interface — A clean chat UI similar to ChatGPT, with system prompt controls
  • Local API server — An OpenAI-compatible endpoint your apps can connect to
  • Hardware detection — Auto-configures GPU layers for NVIDIA, AMD, and Apple Silicon
  • Model comparison — Run two models side-by-side

LM Studio is free for personal use and runs on macOS, Windows, and Linux.

System Requirements

ComponentMinimumRecommended
RAM8 GB16 GB+
VRAM (GPU)4 GB8 GB+
Storage10 GB free50 GB+
OSWindows 10, macOS 12, Ubuntu 22.04Latest versions

Apple Silicon Macs (M1 and later) are exceptionally well-supported — the Metal backend gives you fast inference even on unified memory.

Installation

Download the installer from lmstudio.ai. There are no dependencies to install manually — the package bundles everything including CUDA libraries for NVIDIA users.

  • Windows: Run the .exe installer, launch from Start menu
  • macOS: Mount the .dmg, drag to Applications
  • Linux: Download the .AppImage, make executable, and run
# Linux AppImage setup
chmod +x LM_Studio-0.3.x.AppImage
./LM_Studio-0.3.x.AppImage

Downloading Models

LM Studio has a built-in model search powered by Hugging Face. Click the Discover tab (magnifying glass icon) and search for a model.

For everyday chat:

  • lmstudio-community/Meta-Llama-3.2-3B-Instruct-GGUF — Fast, small, 2 GB
  • bartowski/mistral-7b-instruct-v0.3-GGUF — Well-rounded, 4 GB

For coding:

  • lmstudio-community/Qwen2.5-Coder-7B-Instruct-GGUF — Excellent code model
  • TheBloke/CodeLlama-13B-Instruct-GGUF — Strong for larger codebases

For reasoning:

  • lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF — Chain-of-thought reasoning
  • bartowski/Phi-4-GGUF — Microsoft’s efficient reasoning model

Choosing the Right Quantization

When you click a model, LM Studio shows multiple quantization options. The file size column tells you the VRAM requirement:

QuantizationQualitySize (7B model)
Q2_KLow~2.8 GB
Q4_K_MGood~4.1 GB
Q5_K_MBetter~4.8 GB
Q6_KHigh~5.5 GB
Q8_0Near-lossless~7.2 GB

Q4_K_M is the recommended default — it fits in 4–8 GB VRAM while preserving most of the model’s quality.

Chatting with a Model

  1. Click the Chat tab (speech bubble icon)
  2. Click Select a model to load at the top
  3. Choose your downloaded model from the list
  4. LM Studio loads the model and shows GPU layer allocation

The System Prompt field at the top of the chat lets you define the AI’s persona and behavior:

You are a helpful programming assistant specializing in Python and JavaScript.
Answer concisely with working code examples. Always explain what the code does.

Chat Settings Panel

The right panel exposes key inference parameters:

  • Temperature — Creativity (0.0 = deterministic, 1.0 = creative). Set 0.3–0.5 for factual tasks.
  • Context Length — How many tokens of history the model can see. Higher uses more VRAM.
  • Max Tokens — Maximum response length.
  • Top P / Top K — Sampling controls; defaults are usually fine.
  • Repeat Penalty — Prevents repetitive output. 1.1 is a good default.

Using the Local API Server

LM Studio’s most powerful feature is its OpenAI-compatible API server. Any tool built for the OpenAI API can point at LM Studio instead.

Starting the Server

  1. Click the Local Server tab (server icon in the left sidebar)
  2. Select a model to load for the server
  3. Click Start Server

The server runs on http://localhost:1234 by default. You can change the port in the settings.

API Usage

from openai import OpenAI

# Point the OpenAI client at LM Studio
client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"  # any string works
)

response = client.chat.completions.create(
    model="lmstudio-community/Meta-Llama-3.2-3B-Instruct-GGUF",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain TCP/IP in simple terms."}
    ],
    temperature=0.7
)

print(response.choices[0].message.content)

Embeddings API

LM Studio also supports the embeddings endpoint, useful for RAG applications:

response = client.embeddings.create(
    model="nomic-ai/nomic-embed-text-v1.5-GGUF",
    input="The quick brown fox"
)
embedding = response.data[0].embedding

GPU Configuration

LM Studio shows how many model layers are offloaded to your GPU. More GPU layers = faster inference. If you’re running out of VRAM, reduce the GPU Layers slider under Model Configuration.

For NVIDIA users: LM Studio uses llama.cpp under the hood with CUDA. Make sure your NVIDIA drivers are up to date (driver version 520+ for CUDA 12).

For AMD users (Linux): LM Studio uses the ROCm/Vulkan backend. Performance is good on RX 6000 series and newer.

For Apple Silicon: LM Studio uses Metal acceleration automatically. Unified memory means an M2 Pro with 32 GB RAM can run 13B models at full GPU speed.

Model Management Tips

  • Model folder: LM Studio stores models in ~/LM Studio/models/ by default. You can change this in Settings > Storage.
  • Sharing with Ollama: Point both tools at the same folder to avoid downloading models twice (they both use GGUF format).
  • Duplicate detection: LM Studio warns you if you already have a model before downloading.

Integrating with Other Tools

Because LM Studio exposes an OpenAI-compatible API, it works out of the box with:

  • Continue.dev — AI code assistant for VS Code and JetBrains. Set the base URL to http://localhost:1234/v1.
  • LangChain — Use ChatOpenAI(base_url="http://localhost:1234/v1", api_key="x").
  • Obsidian Copilot plugin — Set the OpenAI endpoint to your LM Studio server.
  • Cursor IDE — Can proxy through LM Studio with custom API endpoint settings.

LM Studio vs. Ollama

FeatureLM StudioOllama
GUIYesNo (needs Open WebUI)
CLINoYes
Model browserBuilt-inVia ollama.com
API serverYesYes
MultimodalYesYes
LinuxYes (AppImage)Yes (native)
Best forBeginners, GUI usersDevelopers, scripting

Both tools use llama.cpp under the hood and support GGUF models, so you can share model files between them.

LM Studio removes every barrier to running local AI. Download the app, search for a model, and you’re chatting with a local LLM in under ten minutes — no command line required.

#privacy #gui #ai #local-llm #lm-studio