Build a Private AI Knowledge Base with AnythingLLM
AnythingLLM is a full-stack, open-source RAG (Retrieval-Augmented Generation) platform that lets you chat with your documents using any LLM backend — local or cloud. It supports Ollama, LM Studio, GPT4All, OpenAI, Anthropic, and more, making it the most flexible self-hosted AI assistant available in 2026.
This guide covers installation, document ingestion, LLM backend configuration, and advanced workspace management.
What Is RAG and Why Does It Matter?
RAG (Retrieval-Augmented Generation) grounds AI responses in your actual documents rather than relying solely on the model’s training data. The process works like this:
- Your documents are chunked and converted to vector embeddings
- When you ask a question, relevant chunks are retrieved
- Those chunks are injected into the LLM prompt as context
- The LLM answers using both its training and your documents
The result is a model that “knows” your internal documentation, research papers, codebases, or any text you feed it — without fine-tuning.
Installation Options
Desktop App (Easiest)
Download the AnythingLLM desktop installer from useanything.com. Available for Windows, macOS, and Linux. It bundles everything including a built-in vector database.
Docker (Recommended for Teams)
# Pull and run AnythingLLM
docker pull mintplexlabs/anythingllm:latest
docker run -d \
-p 3001:3001 \
--name anythingllm \
-v ~/.anythingllm:/app/server/storage \
-e STORAGE_DIR="/app/server/storage" \
mintplexlabs/anythingllm:latest
Access the UI at http://localhost:3001.
Docker Compose
# docker-compose.yml
version: '3.8'
services:
anythingllm:
image: mintplexlabs/anythingllm:latest
ports:
- "3001:3001"
volumes:
- ./storage:/app/server/storage
environment:
- STORAGE_DIR=/app/server/storage
- JWT_SECRET=your-secret-key-here
restart: unless-stopped
docker compose up -d
First-Time Configuration
When you first open AnythingLLM you’ll be guided through:
- LLM Provider — choose your backend
- Embedding Model — choose how documents are vectorized
- Vector Database — where embeddings are stored
Configuring an LLM Backend
Ollama (Recommended for Local)
First, make sure Ollama is running and you have a model pulled:
ollama pull llama3.2
ollama serve
In AnythingLLM settings:
- LLM Provider: Ollama
- Base URL:
http://localhost:11434 - Model:
llama3.2(or whichever you pulled)
OpenAI
- LLM Provider: OpenAI
- API Key: your key from platform.openai.com
- Model:
gpt-4oorgpt-4o-mini
LM Studio
- LLM Provider: LM Studio
- Base URL:
http://localhost:1234 - Model: whatever you have loaded in LM Studio
Embedding Models
The embedding model converts your documents into vectors. AnythingLLM supports:
| Provider | Model | Notes |
|---|---|---|
| Built-in (Nomic) | nomic-embed-text-v1.5 | Best local option |
| Ollama | nomic-embed-text | Requires ollama pull nomic-embed-text |
| OpenAI | text-embedding-3-small | Fast and cheap, needs API key |
| Cohere | embed-english-v3.0 | High quality, free tier available |
For a fully offline setup, use the built-in Nomic embedder or pull it through Ollama:
ollama pull nomic-embed-text
Vector Databases
AnythingLLM supports multiple vector databases:
| Database | Best For |
|---|---|
| LanceDB (built-in) | Desktop app, getting started |
| Chroma | Local Docker deployments |
| Pinecone | Cloud-scale production |
| Weaviate | Self-hosted at scale |
| Milvus | High-performance, large datasets |
For most users the built-in LanceDB is fine. To switch to Chroma:
# Run Chroma alongside AnythingLLM
docker run -d -p 8000:8000 chromadb/chroma
Then in AnythingLLM settings, set vector database to Chroma with URL http://localhost:8000.
Creating Workspaces
Workspaces are isolated RAG environments — each has its own document collection, chat history, and settings. Create separate workspaces for:
- Company documentation
- Research papers
- Personal notes
- Codebase documentation
Creating a Workspace
- Click + New Workspace in the sidebar
- Name it (e.g., “Security Research”)
- Click Settings to configure the LLM and embedding settings per workspace
- Upload documents
Each workspace maintains independent vector embeddings, so documents don’t bleed between them.
Uploading and Managing Documents
Supported Formats
- PDF, DOCX, TXT, MD
- CSV, JSON, XML
- YouTube video transcripts (paste URL)
- Web pages (paste URL)
- GitHub repositories (connect via URL)
- Confluence and Notion pages (via integrations)
Uploading via the UI
- Open a workspace
- Click the Upload button (paperclip icon)
- Drag and drop files or browse
- Wait for chunking and embedding to complete (progress shown per file)
- Toggle documents on to include them in chat context
Uploading via API
# Upload a document via the REST API
curl -X POST http://localhost:3001/api/v1/document/upload \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "file=@/path/to/document.pdf"
Get your API token from Settings → API Keys.
Chat with Documents
Once documents are uploaded and pinned to a workspace, start chatting:
- Chat mode: Standard conversation with document context injected
- Query mode: Strict document retrieval — only answers if the answer is in your documents
Query mode is useful for compliance or support use cases where hallucination must be minimized.
Citing Sources
AnythingLLM automatically shows which document chunks were used to generate each response. Click the Source button beneath any AI reply to see exact passages and file names.
Advanced: Custom System Prompts
Every workspace can have a custom system prompt that shapes how the AI behaves:
You are a cybersecurity documentation assistant for HackingPC.com.
Only answer questions based on the provided documentation.
When you don't know something, say "I don't have that information in the current documentation."
Format code examples in markdown code blocks.
Set this in Workspace Settings → Prompt.
Multi-User Setup
AnythingLLM supports user accounts with role-based access:
| Role | Capabilities |
|---|---|
| Admin | Full access, settings, user management |
| Manager | Create workspaces, manage documents |
| Default | Chat only in assigned workspaces |
Enable multi-user mode in Settings → Multi-User Mode. Users log in with email and password — no OAuth required unless you configure it.
Automating Document Sync
Use the AnythingLLM API to automate document ingestion from external sources:
import requests
import os
API_URL = "http://localhost:3001/api/v1"
TOKEN = os.environ["ANYTHINGLLM_TOKEN"]
WORKSPACE_SLUG = "security-research"
def upload_document(filepath: str):
with open(filepath, "rb") as f:
resp = requests.post(
f"{API_URL}/document/upload",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": f}
)
doc_id = resp.json()["documents"][0]["id"]
# Pin to workspace
requests.post(
f"{API_URL}/workspace/{WORKSPACE_SLUG}/update-embeddings",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"adds": [doc_id]}
)
print(f"Uploaded and embedded: {filepath}")
upload_document("/home/user/reports/pentest-report-q1-2026.pdf")
AnythingLLM vs. Alternatives
| Tool | RAG Quality | Setup Ease | Multi-User | Local Only |
|---|---|---|---|---|
| AnythingLLM | Excellent | Easy | Yes | Yes |
| GPT4All LocalDocs | Good | Very Easy | No | Yes |
| Open WebUI RAG | Good | Medium | Yes | Yes |
| PrivateGPT | Good | Hard | No | Yes |
Final Thoughts
AnythingLLM is the most complete self-hosted RAG solution available today. It handles everything from document ingestion to multi-user access control, works with any LLM backend, and keeps your data entirely on your infrastructure.
Start with the desktop app to get comfortable, then migrate to Docker when you’re ready to share it with a team.