AI Tools #anythingllm#rag#local-ai

AnythingLLM RAG Setup Guide 2026

Set up AnythingLLM for local RAG with your own documents. Full guide to workspaces, embeddings, LLM backends, and Docker deployment.

7 min read

Build a Private AI Knowledge Base with AnythingLLM

AnythingLLM is a full-stack, open-source RAG (Retrieval-Augmented Generation) platform that lets you chat with your documents using any LLM backend — local or cloud. It supports Ollama, LM Studio, GPT4All, OpenAI, Anthropic, and more, making it the most flexible self-hosted AI assistant available in 2026.

This guide covers installation, document ingestion, LLM backend configuration, and advanced workspace management.


What Is RAG and Why Does It Matter?

RAG (Retrieval-Augmented Generation) grounds AI responses in your actual documents rather than relying solely on the model’s training data. The process works like this:

  1. Your documents are chunked and converted to vector embeddings
  2. When you ask a question, relevant chunks are retrieved
  3. Those chunks are injected into the LLM prompt as context
  4. The LLM answers using both its training and your documents

The result is a model that “knows” your internal documentation, research papers, codebases, or any text you feed it — without fine-tuning.


Installation Options

Desktop App (Easiest)

Download the AnythingLLM desktop installer from useanything.com. Available for Windows, macOS, and Linux. It bundles everything including a built-in vector database.

# Pull and run AnythingLLM
docker pull mintplexlabs/anythingllm:latest

docker run -d \
  -p 3001:3001 \
  --name anythingllm \
  -v ~/.anythingllm:/app/server/storage \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm:latest

Access the UI at http://localhost:3001.

Docker Compose

# docker-compose.yml
version: '3.8'
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    ports:
      - "3001:3001"
    volumes:
      - ./storage:/app/server/storage
    environment:
      - STORAGE_DIR=/app/server/storage
      - JWT_SECRET=your-secret-key-here
    restart: unless-stopped
docker compose up -d

First-Time Configuration

When you first open AnythingLLM you’ll be guided through:

  1. LLM Provider — choose your backend
  2. Embedding Model — choose how documents are vectorized
  3. Vector Database — where embeddings are stored

Configuring an LLM Backend

First, make sure Ollama is running and you have a model pulled:

ollama pull llama3.2
ollama serve

In AnythingLLM settings:

  • LLM Provider: Ollama
  • Base URL: http://localhost:11434
  • Model: llama3.2 (or whichever you pulled)

OpenAI

  • LLM Provider: OpenAI
  • API Key: your key from platform.openai.com
  • Model: gpt-4o or gpt-4o-mini

LM Studio

  • LLM Provider: LM Studio
  • Base URL: http://localhost:1234
  • Model: whatever you have loaded in LM Studio

Embedding Models

The embedding model converts your documents into vectors. AnythingLLM supports:

ProviderModelNotes
Built-in (Nomic)nomic-embed-text-v1.5Best local option
Ollamanomic-embed-textRequires ollama pull nomic-embed-text
OpenAItext-embedding-3-smallFast and cheap, needs API key
Cohereembed-english-v3.0High quality, free tier available

For a fully offline setup, use the built-in Nomic embedder or pull it through Ollama:

ollama pull nomic-embed-text

Vector Databases

AnythingLLM supports multiple vector databases:

DatabaseBest For
LanceDB (built-in)Desktop app, getting started
ChromaLocal Docker deployments
PineconeCloud-scale production
WeaviateSelf-hosted at scale
MilvusHigh-performance, large datasets

For most users the built-in LanceDB is fine. To switch to Chroma:

# Run Chroma alongside AnythingLLM
docker run -d -p 8000:8000 chromadb/chroma

Then in AnythingLLM settings, set vector database to Chroma with URL http://localhost:8000.


Creating Workspaces

Workspaces are isolated RAG environments — each has its own document collection, chat history, and settings. Create separate workspaces for:

  • Company documentation
  • Research papers
  • Personal notes
  • Codebase documentation

Creating a Workspace

  1. Click + New Workspace in the sidebar
  2. Name it (e.g., “Security Research”)
  3. Click Settings to configure the LLM and embedding settings per workspace
  4. Upload documents

Each workspace maintains independent vector embeddings, so documents don’t bleed between them.


Uploading and Managing Documents

Supported Formats

  • PDF, DOCX, TXT, MD
  • CSV, JSON, XML
  • YouTube video transcripts (paste URL)
  • Web pages (paste URL)
  • GitHub repositories (connect via URL)
  • Confluence and Notion pages (via integrations)

Uploading via the UI

  1. Open a workspace
  2. Click the Upload button (paperclip icon)
  3. Drag and drop files or browse
  4. Wait for chunking and embedding to complete (progress shown per file)
  5. Toggle documents on to include them in chat context

Uploading via API

# Upload a document via the REST API
curl -X POST http://localhost:3001/api/v1/document/upload \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@/path/to/document.pdf"

Get your API token from Settings → API Keys.


Chat with Documents

Once documents are uploaded and pinned to a workspace, start chatting:

  • Chat mode: Standard conversation with document context injected
  • Query mode: Strict document retrieval — only answers if the answer is in your documents

Query mode is useful for compliance or support use cases where hallucination must be minimized.

Citing Sources

AnythingLLM automatically shows which document chunks were used to generate each response. Click the Source button beneath any AI reply to see exact passages and file names.


Advanced: Custom System Prompts

Every workspace can have a custom system prompt that shapes how the AI behaves:

You are a cybersecurity documentation assistant for HackingPC.com.
Only answer questions based on the provided documentation.
When you don't know something, say "I don't have that information in the current documentation."
Format code examples in markdown code blocks.

Set this in Workspace Settings → Prompt.


Multi-User Setup

AnythingLLM supports user accounts with role-based access:

RoleCapabilities
AdminFull access, settings, user management
ManagerCreate workspaces, manage documents
DefaultChat only in assigned workspaces

Enable multi-user mode in Settings → Multi-User Mode. Users log in with email and password — no OAuth required unless you configure it.


Automating Document Sync

Use the AnythingLLM API to automate document ingestion from external sources:

import requests
import os

API_URL = "http://localhost:3001/api/v1"
TOKEN = os.environ["ANYTHINGLLM_TOKEN"]
WORKSPACE_SLUG = "security-research"

def upload_document(filepath: str):
    with open(filepath, "rb") as f:
        resp = requests.post(
            f"{API_URL}/document/upload",
            headers={"Authorization": f"Bearer {TOKEN}"},
            files={"file": f}
        )
    doc_id = resp.json()["documents"][0]["id"]
    
    # Pin to workspace
    requests.post(
        f"{API_URL}/workspace/{WORKSPACE_SLUG}/update-embeddings",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={"adds": [doc_id]}
    )
    print(f"Uploaded and embedded: {filepath}")

upload_document("/home/user/reports/pentest-report-q1-2026.pdf")

AnythingLLM vs. Alternatives

ToolRAG QualitySetup EaseMulti-UserLocal Only
AnythingLLMExcellentEasyYesYes
GPT4All LocalDocsGoodVery EasyNoYes
Open WebUI RAGGoodMediumYesYes
PrivateGPTGoodHardNoYes

Final Thoughts

AnythingLLM is the most complete self-hosted RAG solution available today. It handles everything from document ingestion to multi-user access control, works with any LLM backend, and keeps your data entirely on your infrastructure.

Start with the desktop app to get comfortable, then migrate to Docker when you’re ready to share it with a team.

#langchain #document-chat #local-ai #rag #anythingllm