Microsoft AutoGen: Build Multi-Agent AI Systems
AutoGen is Microsoft’s open-source framework for building systems where multiple AI agents collaborate to solve complex problems. Instead of relying on a single LLM call, AutoGen lets you design networks of specialized agents that can converse with each other, call tools, write and execute code, and iteratively refine their work. This guide walks through installation, core concepts, and practical patterns with real code examples.
Why Multi-Agent Systems?
Single LLM calls hit their limits with complex tasks that require:
- Planning and decomposition: Breaking a large problem into steps
- Verification: Having one agent check another’s work
- Specialization: Using different models for different subtasks
- Iteration: Refining outputs over multiple turns
AutoGen orchestrates these patterns without requiring you to build the conversation loop infrastructure from scratch.
Installation
AutoGen v0.4+ (the current generation) is available via pip:
pip install pyautogen
For the full feature set including code execution and web browsing:
pip install pyautogen[teachable,lmm,retrievechat,mathchat]
For the newer AutoGen 0.4 API (agentchat module):
pip install autogen-agentchat autogen-ext[openai]
Set your API key:
export OPENAI_API_KEY="sk-..."
# Or for Azure OpenAI:
export AZURE_OPENAI_API_KEY="..."
Core Concepts
AutoGen’s two fundamental agent types are:
AssistantAgent
An LLM-backed agent that reasons, plans, and generates responses. It cannot execute code directly — it produces code and text for other agents to act on.
UserProxyAgent
A “human-in-the-loop” style agent that can execute code, run tools, and relay results. It acts as the bridge between the LLM’s outputs and the real world. The “user proxy” name reflects that it can either wait for a real human response or automatically handle certain actions.
Your First AutoGen Program
Here’s a minimal two-agent setup:
import autogen
config_list = [
{
"model": "gpt-4o",
"api_key": "your-api-key-here"
}
]
llm_config = {
"config_list": config_list,
"temperature": 0.1,
"timeout": 120
}
# Create the assistant agent
assistant = autogen.AssistantAgent(
name="Assistant",
llm_config=llm_config,
system_message="You are a helpful AI assistant. When solving coding problems, write clean, well-commented Python code."
)
# Create a proxy that auto-executes code
user_proxy = autogen.UserProxyAgent(
name="User",
human_input_mode="NEVER", # Fully automated
max_consecutive_auto_reply=10,
code_execution_config={
"work_dir": "coding_workspace",
"use_docker": False
}
)
# Start the conversation
user_proxy.initiate_chat(
assistant,
message="Write a Python script that fetches the current Bitcoin price from a public API and displays it formatted."
)
The assistant writes the code, the user proxy executes it, the output goes back to the assistant, and the loop continues until the task is complete.
Using Claude with AutoGen
AutoGen supports multiple LLM providers. To use Anthropic’s Claude:
config_list = [
{
"model": "claude-opus-4-5",
"api_key": "sk-ant-api03-...",
"api_type": "anthropic"
}
]
Or mix models — use Claude for planning and GPT-4o for execution:
planner_config = {"config_list": [{"model": "claude-opus-4-5", "api_key": "..."}]}
executor_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]}
Tool Calling in AutoGen
AutoGen agents can call tools (functions) as part of their workflow. Register tools on the appropriate agent:
import autogen
from datetime import datetime
import requests
def get_current_time() -> str:
"""Get the current date and time."""
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
def fetch_url(url: str) -> str:
"""Fetch the content of a URL."""
response = requests.get(url, timeout=10)
return response.text[:2000] # Return first 2000 chars
# Register tools with the assistant
assistant = autogen.AssistantAgent(
name="Assistant",
llm_config=llm_config
)
user_proxy = autogen.UserProxyAgent(
name="User",
human_input_mode="NEVER",
code_execution_config=False # Disable code exec, use tools instead
)
# Register tools using the decorator pattern
@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Get the current date and time")
def get_time() -> str:
return get_current_time()
@user_proxy.register_for_execution()
@assistant.register_for_llm(description="Fetch content from a URL")
def fetch_webpage(url: str) -> str:
return fetch_url(url)
user_proxy.initiate_chat(
assistant,
message="What time is it right now, and what's on the front page of news.ycombinator.com?"
)
GroupChat: Multiple Agents Collaborating
For complex tasks, you can create a group of specialized agents that discuss the problem together:
import autogen
llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]}
# Specialized agents
planner = autogen.AssistantAgent(
name="Planner",
system_message="""You are a technical project planner.
Break down complex tasks into clear, executable steps.
Delegate to the appropriate specialist.""",
llm_config=llm_config
)
coder = autogen.AssistantAgent(
name="Coder",
system_message="""You are an expert Python developer.
Write clean, tested code. Always include error handling.""",
llm_config=llm_config
)
reviewer = autogen.AssistantAgent(
name="Reviewer",
system_message="""You are a code reviewer.
Check for bugs, security issues, performance problems,
and suggest improvements.""",
llm_config=llm_config
)
user_proxy = autogen.UserProxyAgent(
name="User",
human_input_mode="NEVER",
code_execution_config={"work_dir": "workspace", "use_docker": False}
)
# Create the group chat
groupchat = autogen.GroupChat(
agents=[user_proxy, planner, coder, reviewer],
messages=[],
max_round=20,
speaker_selection_method="auto" # LLM decides who speaks next
)
manager = autogen.GroupChatManager(
groupchat=groupchat,
llm_config=llm_config
)
user_proxy.initiate_chat(
manager,
message="Build a simple URL shortener service in Python with Flask. Include rate limiting and a redirect endpoint."
)
The manager agent uses the LLM to decide which agent should respond next based on the conversation context.
Sequential vs Nested Conversations
Sequential Pattern
Chain agents where each builds on the previous output:
# Step 1: Research
research_result = user_proxy.initiate_chat(
researcher,
message="Research the top 5 Python web frameworks in 2026"
)
# Step 2: Write based on research
summary = research_result.last_message()["content"]
user_proxy.initiate_chat(
writer,
message=f"Write a blog post based on this research: {summary}"
)
Nested Conversations (AutoGen 0.4+)
In AutoGen v0.4, you can nest conversations where one agent can spawn sub-conversations:
from autogen import ConversableAgent
outer_agent = ConversableAgent(
name="Orchestrator",
llm_config=llm_config,
human_input_mode="NEVER"
)
Termination Conditions
Control when conversations end:
user_proxy = autogen.UserProxyAgent(
name="User",
human_input_mode="NEVER",
max_consecutive_auto_reply=5, # Stop after 5 auto-replies
is_termination_msg=lambda msg: "TASK_COMPLETE" in msg.get("content", ""),
code_execution_config={"work_dir": "workspace"}
)
The is_termination_msg function checks each message — when it returns True, the conversation ends cleanly.
Practical Example: Research + Report Agent
import autogen
llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]}
researcher = autogen.AssistantAgent(
name="Researcher",
system_message="""Research the given topic thoroughly.
Identify key facts, statistics, and recent developments.
When done, say RESEARCH_COMPLETE and summarize your findings.""",
llm_config=llm_config
)
writer = autogen.AssistantAgent(
name="Writer",
system_message="""Take research findings and write a clear,
well-structured report. Use markdown formatting.
When done, say TASK_COMPLETE.""",
llm_config=llm_config
)
user_proxy = autogen.UserProxyAgent(
name="User",
human_input_mode="NEVER",
max_consecutive_auto_reply=15,
is_termination_msg=lambda x: "TASK_COMPLETE" in x.get("content", ""),
code_execution_config=False
)
user_proxy.initiate_chat(
researcher,
message="Research the current state of quantum computing in 2026, focusing on practical applications."
)
Best Practices
- Keep system messages focused: Each agent should have a clear, narrow role
- Use
human_input_mode="TERMINATE"during development to intervene when needed - Enable Docker for code execution in production to sandbox potentially dangerous code
- Log conversations: AutoGen supports logging for debugging and auditing
- Set reasonable
max_roundlimits to prevent runaway conversations
AutoGen is particularly powerful for tasks that benefit from multiple perspectives, iterative refinement, or specialized expertise — research pipelines, code generation with review, and data analysis workflows are all excellent use cases.