Build a Private AI Knowledge Base with AnythingLLM

AnythingLLM is a full-stack, open-source RAG (Retrieval-Augmented Generation) platform that lets you chat with your documents using any LLM backend — local or cloud. It supports Ollama, LM Studio, GPT4All, OpenAI, Anthropic, and more, making it the most flexible self-hosted AI assistant available in 2026.

This guide covers installation, document ingestion, LLM backend configuration, and advanced workspace management.

What Is RAG and Why Does It Matter?

RAG (Retrieval-Augmented Generation) grounds AI responses in your actual documents rather than relying solely on the model’s training data. The process works like this:

Your documents are chunked and converted to vector embeddings
When you ask a question, relevant chunks are retrieved
Those chunks are injected into the LLM prompt as context
The LLM answers using both its training and your documents

The result is a model that “knows” your internal documentation, research papers, codebases, or any text you feed it — without fine-tuning.

Installation Options

Desktop App (Easiest)

Download the AnythingLLM desktop installer from useanything.com. Available for Windows, macOS, and Linux. It bundles everything including a built-in vector database.

Docker (Recommended for Teams)

# Pull and run AnythingLLM
docker pull mintplexlabs/anythingllm:latest

docker run -d \
  -p 3001:3001 \
  --name anythingllm \
  -v ~/.anythingllm:/app/server/storage \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm:latest

Access the UI at http://localhost:3001.

Docker Compose

# docker-compose.yml
version: '3.8'
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    ports:
      - "3001:3001"
    volumes:
      - ./storage:/app/server/storage
    environment:
      - STORAGE_DIR=/app/server/storage
      - JWT_SECRET=your-secret-key-here
    restart: unless-stopped

docker compose up -d

First-Time Configuration

When you first open AnythingLLM you’ll be guided through:

LLM Provider — choose your backend
Embedding Model — choose how documents are vectorized
Vector Database — where embeddings are stored

Configuring an LLM Backend

Ollama (Recommended for Local)

First, make sure Ollama is running and you have a model pulled:

ollama pull llama3.2
ollama serve

In AnythingLLM settings:

LLM Provider: Ollama
Base URL: http://localhost:11434
Model: llama3.2 (or whichever you pulled)

OpenAI

LLM Provider: OpenAI
API Key: your key from platform.openai.com
Model: gpt-4o or gpt-4o-mini

LM Studio

LLM Provider: LM Studio
Base URL: http://localhost:1234
Model: whatever you have loaded in LM Studio

Embedding Models

The embedding model converts your documents into vectors. AnythingLLM supports:

Provider	Model	Notes
Built-in (Nomic)	nomic-embed-text-v1.5	Best local option
Ollama	nomic-embed-text	Requires `ollama pull nomic-embed-text`
OpenAI	text-embedding-3-small	Fast and cheap, needs API key
Cohere	embed-english-v3.0	High quality, free tier available

For a fully offline setup, use the built-in Nomic embedder or pull it through Ollama:

ollama pull nomic-embed-text

Vector Databases

AnythingLLM supports multiple vector databases:

Database	Best For
LanceDB (built-in)	Desktop app, getting started
Chroma	Local Docker deployments
Pinecone	Cloud-scale production
Weaviate	Self-hosted at scale
Milvus	High-performance, large datasets

For most users the built-in LanceDB is fine. To switch to Chroma:

# Run Chroma alongside AnythingLLM
docker run -d -p 8000:8000 chromadb/chroma

Then in AnythingLLM settings, set vector database to Chroma with URL http://localhost:8000.

Creating Workspaces

Workspaces are isolated RAG environments — each has its own document collection, chat history, and settings. Create separate workspaces for:

Company documentation
Research papers
Personal notes
Codebase documentation

Creating a Workspace

Click + New Workspace in the sidebar
Name it (e.g., “Security Research”)
Click Settings to configure the LLM and embedding settings per workspace
Upload documents

Each workspace maintains independent vector embeddings, so documents don’t bleed between them.

Uploading and Managing Documents

Supported Formats

PDF, DOCX, TXT, MD
CSV, JSON, XML
YouTube video transcripts (paste URL)
Web pages (paste URL)
GitHub repositories (connect via URL)
Confluence and Notion pages (via integrations)

Uploading via the UI

Open a workspace
Click the Upload button (paperclip icon)
Drag and drop files or browse
Wait for chunking and embedding to complete (progress shown per file)
Toggle documents on to include them in chat context

Uploading via API

# Upload a document via the REST API
curl -X POST http://localhost:3001/api/v1/document/upload \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@/path/to/document.pdf"

Get your API token from Settings → API Keys.

Chat with Documents

Once documents are uploaded and pinned to a workspace, start chatting:

Chat mode: Standard conversation with document context injected
Query mode: Strict document retrieval — only answers if the answer is in your documents

Query mode is useful for compliance or support use cases where hallucination must be minimized.

Citing Sources

AnythingLLM automatically shows which document chunks were used to generate each response. Click the Source button beneath any AI reply to see exact passages and file names.

Advanced: Custom System Prompts

Every workspace can have a custom system prompt that shapes how the AI behaves:

You are a cybersecurity documentation assistant for HackingPC.com.
Only answer questions based on the provided documentation.
When you don't know something, say "I don't have that information in the current documentation."
Format code examples in markdown code blocks.

Set this in Workspace Settings → Prompt.

Multi-User Setup

AnythingLLM supports user accounts with role-based access:

Role	Capabilities
Admin	Full access, settings, user management
Manager	Create workspaces, manage documents
Default	Chat only in assigned workspaces

Enable multi-user mode in Settings → Multi-User Mode. Users log in with email and password — no OAuth required unless you configure it.

Automating Document Sync

Use the AnythingLLM API to automate document ingestion from external sources:

import requests
import os

API_URL = "http://localhost:3001/api/v1"
TOKEN = os.environ["ANYTHINGLLM_TOKEN"]
WORKSPACE_SLUG = "security-research"

def upload_document(filepath: str):
    with open(filepath, "rb") as f:
        resp = requests.post(
            f"{API_URL}/document/upload",
            headers={"Authorization": f"Bearer {TOKEN}"},
            files={"file": f}
        )
    doc_id = resp.json()["documents"][0]["id"]
    
    # Pin to workspace
    requests.post(
        f"{API_URL}/workspace/{WORKSPACE_SLUG}/update-embeddings",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={"adds": [doc_id]}
    )
    print(f"Uploaded and embedded: {filepath}")

upload_document("/home/user/reports/pentest-report-q1-2026.pdf")

AnythingLLM vs. Alternatives

Tool	RAG Quality	Setup Ease	Multi-User	Local Only
AnythingLLM	Excellent	Easy	Yes	Yes
GPT4All LocalDocs	Good	Very Easy	No	Yes
Open WebUI RAG	Good	Medium	Yes	Yes
PrivateGPT	Good	Hard	No	Yes

Final Thoughts

AnythingLLM is the most complete self-hosted RAG solution available today. It handles everything from document ingestion to multi-user access control, works with any LLM backend, and keeps your data entirely on your infrastructure.

Start with the desktop app to get comfortable, then migrate to Docker when you’re ready to share it with a team.