LangChain is the most widely used framework for building applications powered by large language models (LLMs). It provides abstractions for chaining prompts, connecting LLMs to external data sources, building agents that use tools, and creating retrieval-augmented generation (RAG) pipelines. In 2026, LangChain v0.3+ has matured significantly with a cleaner API and better async support. This guide gets you from zero to a working AI application.
Installation
LangChain is a Python library (with a JavaScript/TypeScript version available as LangChain.js). Install the core package and the OpenAI integration:
pip install langchain langchain-openai langchain-community
For a specific LLM provider, install the corresponding package:
pip install langchain-anthropic # Claude
pip install langchain-google-genai # Gemini
pip install langchain-ollama # Local models via Ollama
Basic Concepts
LangChain is built around a few core primitives:
Models (LLMs and Chat Models): Wrappers around AI model APIs. ChatOpenAI, ChatAnthropic, ChatOllama.
Prompts: Templates for structuring input to models. ChatPromptTemplate, PromptTemplate.
Chains: Sequences of components connected via LCEL (LangChain Expression Language) using the | pipe operator.
Agents: LLMs that decide which tools to call and in what order to complete a task.
Retrievers: Components that fetch relevant documents from a vector store or search engine.
Your First Chain
A simple chain connects a prompt template to an LLM:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Set your API key
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
# Define the model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
# Create a prompt template
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful technical writer. Explain concepts clearly and concisely."),
("human", "Explain {concept} in simple terms for a beginner.")
])
# Build the chain with LCEL
chain = prompt | llm | StrOutputParser()
# Run the chain
result = chain.invoke({"concept": "Docker containers"})
print(result)
LCEL’s pipe operator (|) connects components: prompt renders the template, passes it to the LLM, and StrOutputParser extracts the text response.
Building a RAG Pipeline
RAG (Retrieval-Augmented Generation) lets your LLM answer questions about your own documents by retrieving relevant chunks and including them in the prompt context.
Step 1: Load and Split Documents
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = PyPDFLoader("your_document.pdf")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")
Step 2: Create a Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
Step 3: Build the RAG Chain
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
rag_prompt = ChatPromptTemplate.from_messages([
("system", """Answer the question based on the following context.
If the answer isn't in the context, say you don't know.
Context: {context}"""),
("human", "{question}")
])
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| llm
| StrOutputParser()
)
answer = rag_chain.invoke("What are the main conclusions of this document?")
print(answer)
Building an Agent with Tools
Agents use LLMs to decide which tools to call. Here’s an agent with a web search tool:
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
# Define tools
search = DuckDuckGoSearchRun()
tools = [search]
# Agent prompt requires a {agent_scratchpad} placeholder
agent_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful research assistant. Use tools to find accurate information."),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
# Create the agent
agent = create_tool_calling_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = agent_executor.invoke({
"input": "What are the latest developments in quantum computing in 2026?"
})
print(result["output"])
The verbose=True flag shows the agent’s reasoning — what tool it chose and why — making it easy to debug.
Using Local Models with Ollama
Replace OpenAI with a local Ollama model for privacy or cost savings:
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1:8b", temperature=0.7)
# Use in any chain the same way
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"concept": "neural networks"})
Ensure Ollama is running (ollama serve) and the model is downloaded (ollama pull llama3.1:8b) before using it.
Streaming Responses
For chat applications, stream responses token by token for a better user experience:
for chunk in chain.stream({"concept": "machine learning"}):
print(chunk, end="", flush=True)
print() # newline after streaming completes
Conversation Memory
To maintain conversation history across turns:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
conversational_chain = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
)
# Same session_id maintains conversation
response1 = conversational_chain.invoke(
{"input": "My name is Alex."},
config={"configurable": {"session_id": "user_123"}}
)
response2 = conversational_chain.invoke(
{"input": "What's my name?"},
config={"configurable": {"session_id": "user_123"}}
)
print(response2) # Should remember "Alex"
LangSmith: Debugging and Monitoring
LangSmith is LangChain’s observability platform. Enable it by adding two environment variables:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your-langsmith-api-key
Every chain invocation is now traced in the LangSmith dashboard at smith.langchain.com. You can see exact prompts sent, tokens used, latency, and errors — invaluable for debugging complex chains.
Next Steps
With these foundations, you can build:
- Customer support chatbots with RAG over your documentation
- Research agents that search the web and synthesize information
- Code review tools that analyze repositories
- Document processing pipelines that extract structured data
LangChain’s ecosystem of integrations — 70+ vector stores, 50+ LLM providers, hundreds of document loaders — means you can adapt these patterns to nearly any data source or model.