AI Tools #openai#assistants-api#python

OpenAI Assistants API: Complete Guide for 2026

Build persistent AI assistants with threads, file search, and function calling using the OpenAI Assistants API. Setup, code examples, and cost tips.

7 min read

The OpenAI Assistants API is OpenAI’s framework for building stateful AI applications — ones that remember conversation history, search through uploaded files, and call your functions automatically. It sits a layer above the raw Chat Completions API and handles the infrastructure complexity that would otherwise land on your server.

What the Assistants API Actually Does

The Assistants API introduces four core primitives:

  • Assistants: a configured AI persona with a model, instructions, and enabled tools
  • Threads: persistent conversation histories stored server-side (no token limit headaches)
  • Messages: individual user or assistant turns within a thread
  • Runs: an execution of an assistant against a thread that produces a response

The major advantage over direct completions: you don’t manage conversation history yourself. OpenAI stores it, truncates intelligently when context limits approach, and keeps your application code stateless.

Built-in tools available in 2026:

  • File Search (formerly Retrieval): semantic search over uploaded documents
  • Code Interpreter: sandboxed Python execution with file I/O
  • Function Calling: structured calls to your own APIs

Setup: Install the Python SDK

pip install openai

Set your API key as an environment variable:

export OPENAI_API_KEY="sk-proj-..."
from openai import OpenAI

client = OpenAI()  # Reads OPENAI_API_KEY from environment

Creating an Assistant

assistant = client.beta.assistants.create(
    name="Technical Support Bot",
    instructions="""You are a technical support specialist. 
    Answer questions based on the provided documentation. 
    Be concise and include code examples where relevant.""",
    model="gpt-4o",
    tools=[
        {"type": "file_search"},
        {"type": "code_interpreter"}
    ]
)

print(f"Assistant ID: {assistant.id}")
# Save this ID — you can reuse it across sessions

Models available in 2026: gpt-4o, gpt-4o-mini, gpt-4-turbo, o3, o4-mini. Use gpt-4o-mini for cost-sensitive applications.

Uploading Files for Retrieval

File Search lets your assistant answer questions grounded in your documents — PDFs, text files, code, markdown, and more.

# Upload a file
with open("technical_docs.pdf", "rb") as f:
    file = client.files.create(
        file=f,
        purpose="assistants"
    )

# Create a vector store and attach the file
vector_store = client.beta.vector_stores.create(
    name="Technical Docs",
    file_ids=[file.id]
)

# Update the assistant to use this vector store
assistant = client.beta.assistants.update(
    assistant.id,
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)

OpenAI automatically chunks and embeds your files. Queries to the assistant will semantically search across all attached documents.

Creating Threads and Messages

Threads are free to create and persist indefinitely. Each user session should typically get its own thread:

# Create a new thread for this user session
thread = client.beta.threads.create()

# Add a user message
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What does the rate limiting section say about retry strategies?"
)

You can attach files at the message level too, for one-off document questions:

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Summarize this report",
    attachments=[{
        "file_id": file.id,
        "tools": [{"type": "file_search"}]
    }]
)

Running and Polling

A Run triggers the assistant to process the thread and generate a response:

import time

# Create a run
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Poll until completion
while run.status in ["queued", "in_progress"]:
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )

# Retrieve the assistant's response
messages = client.beta.threads.messages.list(
    thread_id=thread.id,
    order="desc",
    limit=1
)

print(messages.data[0].content[0].text.value)

Polling adds latency. The streaming approach delivers tokens as they’re generated:

with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id
) as stream:
    for text in stream.text_deltas:
        print(text, end="", flush=True)

Function Calling Integration

Register your custom functions as tools, and the assistant will call them when appropriate:

import json

# Define the assistant with a function tool
assistant = client.beta.assistants.create(
    name="Weather Assistant",
    model="gpt-4o",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }]
)

# Handle tool calls during a run
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

while run.status == "requires_action":
    tool_calls = run.required_action.submit_tool_outputs.tool_calls
    tool_outputs = []
    
    for tc in tool_calls:
        args = json.loads(tc.function.arguments)
        # Call your actual function here
        result = {"temperature": 22, "condition": "sunny"}
        tool_outputs.append({
            "tool_call_id": tc.id,
            "output": json.dumps(result)
        })
    
    run = client.beta.threads.runs.submit_tool_outputs(
        thread_id=thread.id,
        run_id=run.id,
        tool_outputs=tool_outputs
    )
    
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

Cost Considerations

Assistants API pricing in 2026 includes:

Cost ComponentDetails
Input/output tokensSame as Chat Completions API rates
File Search$0.10 per GB/day for vector storage
Code Interpreter$0.03 per session
File storage$0.20 per GB/day

Cost-saving tips:

  • Use gpt-4o-mini for most tasks — it handles retrieval and function calling well
  • Delete threads and files when a session ends
  • Set max_prompt_tokens and max_completion_tokens on runs to cap costs
  • Use truncation_strategy to control how long threads are summarized

Assistants API vs. Chat Completions API

ScenarioUse Assistants APIUse Chat Completions
Multi-turn conversationsYesManage history yourself
Document Q&AYes (File Search)Use embeddings + RAG
Code execution sandboxYesNo
Simple one-shot queriesOverkillYes
Streaming responsesSupportedYes, simpler
Full control over contextNoYes
Lower latencyNoYes

The Assistants API shines for user-facing chatbots where you want persistent history, document grounding, and tool use without building the scaffolding yourself. For batch processing, RAG pipelines you control, or latency-sensitive applications, direct completions remain the better choice.

#ai-development #llm-api #python #assistants-api #openai