The OpenAI Assistants API is OpenAI’s framework for building stateful AI applications — ones that remember conversation history, search through uploaded files, and call your functions automatically. It sits a layer above the raw Chat Completions API and handles the infrastructure complexity that would otherwise land on your server.

What the Assistants API Actually Does

The Assistants API introduces four core primitives:

Assistants: a configured AI persona with a model, instructions, and enabled tools
Threads: persistent conversation histories stored server-side (no token limit headaches)
Messages: individual user or assistant turns within a thread
Runs: an execution of an assistant against a thread that produces a response

The major advantage over direct completions: you don’t manage conversation history yourself. OpenAI stores it, truncates intelligently when context limits approach, and keeps your application code stateless.

Built-in tools available in 2026:

File Search (formerly Retrieval): semantic search over uploaded documents
Code Interpreter: sandboxed Python execution with file I/O
Function Calling: structured calls to your own APIs

Setup: Install the Python SDK

pip install openai

Set your API key as an environment variable:

export OPENAI_API_KEY="sk-proj-..."

from openai import OpenAI

client = OpenAI()  # Reads OPENAI_API_KEY from environment

Creating an Assistant

assistant = client.beta.assistants.create(
    name="Technical Support Bot",
    instructions="""You are a technical support specialist. 
    Answer questions based on the provided documentation. 
    Be concise and include code examples where relevant.""",
    model="gpt-4o",
    tools=[
        {"type": "file_search"},
        {"type": "code_interpreter"}
    ]
)

print(f"Assistant ID: {assistant.id}")
# Save this ID — you can reuse it across sessions

Models available in 2026: gpt-4o, gpt-4o-mini, gpt-4-turbo, o3, o4-mini. Use gpt-4o-mini for cost-sensitive applications.

Uploading Files for Retrieval

File Search lets your assistant answer questions grounded in your documents — PDFs, text files, code, markdown, and more.

# Upload a file
with open("technical_docs.pdf", "rb") as f:
    file = client.files.create(
        file=f,
        purpose="assistants"
    )

# Create a vector store and attach the file
vector_store = client.beta.vector_stores.create(
    name="Technical Docs",
    file_ids=[file.id]
)

# Update the assistant to use this vector store
assistant = client.beta.assistants.update(
    assistant.id,
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)

OpenAI automatically chunks and embeds your files. Queries to the assistant will semantically search across all attached documents.

Creating Threads and Messages

Threads are free to create and persist indefinitely. Each user session should typically get its own thread:

# Create a new thread for this user session
thread = client.beta.threads.create()

# Add a user message
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What does the rate limiting section say about retry strategies?"
)

You can attach files at the message level too, for one-off document questions:

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Summarize this report",
    attachments=[{
        "file_id": file.id,
        "tools": [{"type": "file_search"}]
    }]
)

Running and Polling

A Run triggers the assistant to process the thread and generate a response:

import time

# Create a run
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Poll until completion
while run.status in ["queued", "in_progress"]:
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )

# Retrieve the assistant's response
messages = client.beta.threads.messages.list(
    thread_id=thread.id,
    order="desc",
    limit=1
)

print(messages.data[0].content[0].text.value)

Using the Streaming API (Recommended)

Polling adds latency. The streaming approach delivers tokens as they’re generated:

with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id
) as stream:
    for text in stream.text_deltas:
        print(text, end="", flush=True)

Function Calling Integration

import json

# Define the assistant with a function tool
assistant = client.beta.assistants.create(
    name="Weather Assistant",
    model="gpt-4o",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }]
)

# Handle tool calls during a run
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

while run.status == "requires_action":
    tool_calls = run.required_action.submit_tool_outputs.tool_calls
    tool_outputs = []
    
    for tc in tool_calls:
        args = json.loads(tc.function.arguments)
        # Call your actual function here
        result = {"temperature": 22, "condition": "sunny"}
        tool_outputs.append({
            "tool_call_id": tc.id,
            "output": json.dumps(result)
        })
    
    run = client.beta.threads.runs.submit_tool_outputs(
        thread_id=thread.id,
        run_id=run.id,
        tool_outputs=tool_outputs
    )
    
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

Cost Considerations

Assistants API pricing in 2026 includes:

Cost Component	Details
Input/output tokens	Same as Chat Completions API rates
File Search	$0.10 per GB/day for vector storage
Code Interpreter	$0.03 per session
File storage	$0.20 per GB/day

Cost-saving tips:

Use gpt-4o-mini for most tasks — it handles retrieval and function calling well
Delete threads and files when a session ends
Set max_prompt_tokens and max_completion_tokens on runs to cap costs
Use truncation_strategy to control how long threads are summarized

Assistants API vs. Chat Completions API

Scenario	Use Assistants API	Use Chat Completions
Multi-turn conversations	Yes	Manage history yourself
Document Q&A	Yes (File Search)	Use embeddings + RAG
Code execution sandbox	Yes	No
Simple one-shot queries	Overkill	Yes
Streaming responses	Supported	Yes, simpler
Full control over context	No	Yes
Lower latency	No	Yes

The Assistants API shines for user-facing chatbots where you want persistent history, document grounding, and tool use without building the scaffolding yourself. For batch processing, RAG pipelines you control, or latency-sensitive applications, direct completions remain the better choice.