Back to Blog

Building Autonomous AI Agents with AutoGen

Mar 22, 2025

When you're building AI applications, you'll quickly hit a wall with single-prompt workflows.

Complex tasks require multiple steps. Each step needs different expertise. Decisions require back-and-forth reasoning. Code needs debugging. Content needs revision.

A single LLM call can't handle this complexity well.

AutoGen, Microsoft's multi-agent framework, solves this by letting you create conversational agents that collaborate to accomplish tasks. Think of it as giving your AI application a team of specialists that talk to each other.

In production systems, AutoGen excels at tasks requiring iteration, verification, and multi-step reasoning. Let me show you how to build autonomous agents that actually work.

Why AutoGen Over Simple LLM Calls

Before diving into AutoGen, understand what problems it solves.

The Single-Prompt Problem

What you want:

"Build a Python script that processes CSV files, handles errors, and includes tests."

What you get with a single prompt:

python

import pandas as pd

def process_csv(file_path):
    df = pd.read_csv(file_path)
    return df

# No error handling
# No edge cases
# No tests
# Breaks immediately

The problem: One prompt can't capture all requirements, edge cases, and quality standards.

The AutoGen Solution

python

from autogen import AssistantAgent, UserProxyAgent

# Coder agent
coder = AssistantAgent(
    name="Coder",
    system_message="You write Python code. Focus on functionality."
)

# Code reviewer agent
reviewer = AssistantAgent(
    name="Reviewer",
    system_message="You review code for errors, edge cases, and best practices."
)

# Executor agent
executor = UserProxyAgent(
    name="Executor",
    code_execution_config={"work_dir": "coding"}
)

# Agents collaborate:
# 1. Coder writes initial code
# 2. Reviewer critiques it
# 3. Coder revises based on feedback
# 4. Executor tests it
# 5. Repeat until it works

The difference: Iterative refinement through conversation produces better results.

Core AutoGen Concepts

Understanding the building blocks.

1. Agents

Two main types:

AssistantAgent: Uses LLM, doesn't execute code

python

from autogen import AssistantAgent

agent = AssistantAgent(
    name="DataAnalyst",
    system_message="You are a data analyst who provides insights.",
    llm_config={"model": "gpt-4"}
)

UserProxyAgent: Can execute code, optionally uses LLM

python

from autogen import UserProxyAgent

proxy = UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",  # Fully automated
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": False
    }
)

2. Conversations

Agents communicate through messages.

python

# Initiate conversation
proxy.initiate_chat(
    agent,
    message="Analyze this dataset and find trends."
)

# Agent responds
# Proxy can execute code if agent provides it
# Conversation continues until termination condition

3. Termination

Define when conversation should stop.

python

def is_termination_msg(msg):
    """Check if conversation should end"""
    content = msg.get("content", "")
    return "TERMINATE" in content or "DONE" in content

proxy = UserProxyAgent(
    name="Executor",
    is_termination_msg=is_termination_msg,
    max_consecutive_auto_reply=10
)

Basic Pattern: Two-Agent Collaboration

Start with the simplest useful pattern.

Code Generation with Review

python

from autogen import AssistantAgent, UserProxyAgent
import os

# Set API key
os.environ["OPENAI_API_KEY"] = "your-api-key"

# LLM configuration
llm_config = {
    "model": "gpt-4",
    "temperature": 0.7
}

# Code writer agent
coder = AssistantAgent(
    name="PythonDeveloper",
    system_message="""You are an expert Python developer.
    
    When given a task:
    1. Write clean, well-documented code
    2. Include error handling
    3. Add type hints
    4. Make it production-ready
    
    When you finish, say TERMINATE.""",
    llm_config=llm_config
)

# Code executor agent
executor = UserProxyAgent(
    name="CodeExecutor",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").find("TERMINATE") >= 0,
    code_execution_config={
        "work_dir": "generated_code",
        "use_docker": False
    }
)

# Task
task = """
Create a Python function that:
1. Reads a CSV file
2. Calculates summary statistics (mean, median, std)
3. Handles missing values
4. Returns results as a dictionary

Include error handling and tests.
"""

# Start collaboration
executor.initiate_chat(
    coder,
    message=task
)

# Conversation flow:
# 1. Coder writes initial code
# 2. Executor runs it
# 3. If errors, executor reports them
# 4. Coder fixes and iterates
# 5. Continues until working
```

**What happens:**
```
Executor → Coder: "Create a function to process CSV..."

Coder → Executor: "Here's the code:
```python
import pandas as pd
def process_csv(file_path):
    df = pd.read_csv(file_path)
    return {
        'mean': df.mean(),
        'median': df.median()
    }
```
"

Executor: [Runs code]
Executor → Coder: "Error: file_path doesn't exist"

Coder → Executor: "Updated code with error handling:
```python
import pandas as pd
from pathlib import Path

def process_csv(file_path):
    if not Path(file_path).exists():
        raise FileNotFoundError(f"File not found: {file_path}")
    ...
```
"

Executor: [Runs code]
Executor → Coder: "Works! TERMINATE"

Advanced Pattern: Group Chat

Multiple agents collaborating.

Research Paper Writing System

python

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager

# LLM config
llm_config = {"model": "gpt-4"}

# Researcher agent
researcher = AssistantAgent(
    name="Researcher",
    system_message="""You are a research specialist.
    
    Your job:
    - Find relevant information on topics
    - Cite sources
    - Provide comprehensive summaries
    
    Focus on accuracy and depth.""",
    llm_config=llm_config
)

# Writer agent
writer = AssistantAgent(
    name="Writer",
    system_message="""You are a technical writer.
    
    Your job:
    - Create clear, well-structured documents
    - Use proper formatting
    - Make complex topics accessible
    - Maintain professional tone""",
    llm_config=llm_config
)

# Editor agent
editor = AssistantAgent(
    name="Editor",
    system_message="""You are a senior editor.
    
    Your job:
    - Review content for clarity
    - Check logic and flow
    - Identify gaps or errors
    - Suggest improvements
    
    Be constructive but critical.""",
    llm_config=llm_config
)

# Fact-checker agent
fact_checker = AssistantAgent(
    name="FactChecker",
    system_message="""You are a fact-checker.
    
    Your job:
    - Verify claims and statistics
    - Check for logical inconsistencies
    - Flag unsupported assertions
    - Request citations for claims""",
    llm_config=llm_config
)

# User proxy
user = UserProxyAgent(
    name="User",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=0,
    code_execution_config=False,
    is_termination_msg=lambda x: "APPROVED" in x.get("content", "")
)

# Create group chat
groupchat = GroupChat(
    agents=[user, researcher, writer, editor, fact_checker],
    messages=[],
    max_round=20,
    speaker_selection_method="round_robin"  # Each agent speaks in turn
)

# Manager to coordinate
manager = GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

# Task
task = """
Write a 500-word article on "The Impact of AI on Data Engineering"

Requirements:
- Include current trends
- Cite specific examples
- Maintain technical accuracy
- Professional tone
"""

# Start collaborative writing
user.initiate_chat(
    manager,
    message=task
)
```

**Conversation flow:**
```
User → Manager: "Write article on AI in data engineering"

Manager → Researcher: "Research AI trends in data engineering"
Researcher: "Key trends: AutoML, MLOps automation, AI-assisted query optimization..."

Manager → Writer: "Draft article based on research"
Writer: "Draft: AI is transforming data engineering in three ways..."

Manager → FactChecker: "Verify claims in draft"
FactChecker: "Claim about 40% productivity gain needs citation"

Manager → Writer: "Revise with proper citations"
Writer: "Updated draft with sources..."

Manager → Editor: "Review final draft"
Editor: "Looks good. Structure is clear. APPROVED"

Real-World Use Case: Data Analysis Assistant

Build an agent that analyzes datasets autonomously.

Complete Implementation

python

from autogen import AssistantAgent, UserProxyAgent
import pandas as pd

# Configuration
llm_config = {
    "model": "gpt-4",
    "temperature": 0
}

# Data analyst agent
analyst = AssistantAgent(
    name="DataAnalyst",
    system_message="""You are an expert data analyst.

    When analyzing data:
    1. Load and inspect the dataset
    2. Check for data quality issues
    3. Calculate relevant statistics
    4. Identify patterns and trends
    5. Create visualizations
    6. Provide actionable insights
    
    Always write Python code to analyze data.
    Use pandas, matplotlib, seaborn for analysis.
    
    When analysis is complete, say TERMINATE.""",
    llm_config=llm_config
)

# Code executor
executor = UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=15,
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
    code_execution_config={
        "work_dir": "analysis",
        "use_docker": False
    }
)

# Analysis task
task = """
Analyze the file 'sales_data.csv' and provide:

1. Summary statistics
2. Sales trends over time
3. Top performing products
4. Customer segmentation insights
5. Recommendations for improvement

Create visualizations where appropriate.
"""

# Run analysis
executor.initiate_chat(
    analyst,
    message=task
)

What the analyst does:

python

# Step 1: Load and inspect
"""
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('sales_data.csv')
print(df.head())
print(df.info())
print(df.describe())
"""

# Step 2: Check data quality
"""
# Check for missing values
print(df.isnull().sum())

# Check for duplicates
print(f"Duplicates: {df.duplicated().sum()}")
"""

# Step 3: Analyze trends
"""
# Sales over time
df['date'] = pd.to_datetime(df['date'])
daily_sales = df.groupby('date')['revenue'].sum()

plt.figure(figsize=(12, 6))
daily_sales.plot()
plt.title('Daily Sales Trend')
plt.xlabel('Date')
plt.ylabel('Revenue')
plt.savefig('sales_trend.png')
plt.close()

print("Saved sales_trend.png")
"""

# Step 4: Top products
"""
top_products = df.groupby('product')['revenue'].sum().sort_values(ascending=False).head(10)
print("Top 10 Products by Revenue:")
print(top_products)
"""

# Step 5: Customer segments
"""
# Segment by purchase frequency
customer_purchases = df.groupby('customer_id').agg({
    'order_id': 'count',
    'revenue': 'sum'
}).rename(columns={'order_id': 'purchase_count'})

# Define segments
def segment_customer(row):
    if row['purchase_count'] > 10:
        return 'High Frequency'
    elif row['purchase_count'] > 5:
        return 'Medium Frequency'
    else:
        return 'Low Frequency'

customer_purchases['segment'] = customer_purchases.apply(segment_customer, axis=1)
print(customer_purchases['segment'].value_counts())
"""

# Step 6: Recommendations
"""
# Analysis complete. Key findings:
# 1. Revenue trending upward (15% MoM growth)
# 2. Top 3 products account for 45% of revenue
# 3. 20% of customers are high-frequency buyers
# 
# Recommendations:
# 1. Focus marketing on top products
# 2. Create loyalty program for high-frequency buyers
# 3. Investigate why 60% are low-frequency buyers
#
# TERMINATE
"""

Advanced Features

1. Human-in-the-Loop

Let humans intervene when needed.

python

# Agent asks for human input on important decisions
executor = UserProxyAgent(
    name="Executor",
    human_input_mode="TERMINATE",  # Ask human when agent wants to stop
    max_consecutive_auto_reply=5
)

# Or ask for input on every message
executor = UserProxyAgent(
    name="Executor",
    human_input_mode="ALWAYS"  # Human reviews every response
)

2. Function Calling

Agents can call specific functions.

python

def search_database(query: str) -> str:
    """Search database for information"""
    # Your database search logic
    return f"Results for: {query}"

def send_email(to: str, subject: str, body: str) -> str:
    """Send an email"""
    # Your email sending logic
    return f"Email sent to {to}"

# Agent with function access
agent = AssistantAgent(
    name="Assistant",
    llm_config={
        "model": "gpt-4",
        "functions": [
            {
                "name": "search_database",
                "description": "Search the database",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string"}
                    },
                    "required": ["query"]
                }
            },
            {
                "name": "send_email",
                "description": "Send an email",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "to": {"type": "string"},
                        "subject": {"type": "string"},
                        "body": {"type": "string"}
                    },
                    "required": ["to", "subject", "body"]
                }
            }
        ]
    }
)

# Register functions with executor
executor = UserProxyAgent(
    name="Executor",
    function_map={
        "search_database": search_database,
        "send_email": send_email
    }
)

# Agent can now call these functions
executor.initiate_chat(
    agent,
    message="Search the database for customer orders and email the results to admin@company.com"
)

3. Nested Chats

Agents can spawn sub-conversations.

python

# Main task agent
main_agent = AssistantAgent(name="MainAgent", llm_config=llm_config)

# Specialized sub-agents
research_agent = AssistantAgent(name="Researcher", llm_config=llm_config)
writing_agent = AssistantAgent(name="Writer", llm_config=llm_config)

def nested_workflow(task):
    """Complex workflow with nested conversations"""
    
    # Step 1: Research sub-task
    research_proxy = UserProxyAgent(name="ResearchProxy")
    research_result = research_proxy.initiate_chat(
        research_agent,
        message=f"Research: {task}"
    )
    
    # Step 2: Writing sub-task using research
    writing_proxy = UserProxyAgent(name="WritingProxy")
    writing_result = writing_proxy.initiate_chat(
        writing_agent,
        message=f"Write based on: {research_result}"
    )
    
    return writing_result

# Main conversation can spawn these nested chats

4. State Management

Track conversation state across messages.

python

from autogen import ConversableAgent

class StatefulAgent(ConversableAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.state = {
            "tasks_completed": [],
            "current_step": 0,
            "data": {}
        }
    
    def update_state(self, key, value):
        """Update agent state"""
        self.state[key] = value
    
    def get_state(self, key):
        """Get state value"""
        return self.state.get(key)

# Use stateful agent
agent = StatefulAgent(
    name="StatefulAgent",
    llm_config=llm_config
)

# State persists across conversation

Production Patterns

Patterns you'll use in real systems.

Pattern 1: Retry with Escalation

python

def create_retry_workflow():
    """Agent tries task, escalates if fails"""
    
    # Junior agent tries first
    junior = AssistantAgent(
        name="Junior",
        system_message="You are a junior developer. Try to solve problems, but ask for help if stuck.",
        llm_config=llm_config
    )
    
    # Senior agent provides guidance
    senior = AssistantAgent(
        name="Senior",
        system_message="You are a senior developer. Provide guidance when junior developers are stuck.",
        llm_config=llm_config
    )
    
    executor = UserProxyAgent(
        name="Executor",
        max_consecutive_auto_reply=3
    )
    
    # Junior tries first
    result = executor.initiate_chat(junior, message=task)
    
    # If junior failed, escalate to senior
    if "ERROR" in result or "STUCK" in result:
        result = executor.initiate_chat(
            senior,
            message=f"Junior developer stuck on: {task}\n\nAttempt: {result}"
        )
    
    return result

Pattern 2: Consensus Building

python

def multi_agent_consensus(task, agents):
    """Get consensus from multiple agents"""
    
    responses = []
    executor = UserProxyAgent(name="Executor")
    
    # Get response from each agent
    for agent in agents:
        result = executor.initiate_chat(agent, message=task)
        responses.append(result)
    
    # Synthesizer combines responses
    synthesizer = AssistantAgent(
        name="Synthesizer",
        system_message="Synthesize multiple viewpoints into one coherent answer."
    )
    
    synthesis_task = f"""
    Multiple agents provided these responses:
    {responses}
    
    Provide a synthesized answer incorporating the best ideas.
    """
    
    final_result = executor.initiate_chat(synthesizer, message=synthesis_task)
    
    return final_result

Pattern 3: Pipeline with Validation

python

def validated_pipeline(task):
    """Multi-step pipeline with validation at each stage"""
    
    # Step 1: Generate
    generator = AssistantAgent(name="Generator", llm_config=llm_config)
    validator = AssistantAgent(name="Validator", llm_config=llm_config)
    executor = UserProxyAgent(name="Executor")
    
    # Generate initial output
    output = executor.initiate_chat(generator, message=task)
    
    # Validate
    validation_task = f"Validate this output: {output}\n\nIs it correct? If not, what's wrong?"
    validation = executor.initiate_chat(validator, message=validation_task)
    
    # Iterate until valid
    max_iterations = 3
    for i in range(max_iterations):
        if "VALID" in validation:
            break
        
        # Regenerate with feedback
        regenerate_task = f"Improve based on feedback: {validation}"
        output = executor.initiate_chat(generator, message=regenerate_task)
        
        # Revalidate
        validation = executor.initiate_chat(validator, message=f"Validate: {output}")
    
    return output

Performance and Cost Optimization

AutoGen can get expensive quickly.

1. Limit Conversation Rounds

python

executor = UserProxyAgent(
    name="Executor",
    max_consecutive_auto_reply=5,  # Stop after 5 rounds
    max_turns=10  # Total conversation limit
)

2. Use Cheaper Models for Simple Tasks

python

# Expensive: GPT-4 for everything
llm_config = {"model": "gpt-4"}

# Better: GPT-3.5 for simple tasks
simple_agent = AssistantAgent(
    name="SimpleAgent",
    llm_config={"model": "gpt-3.5-turbo"}  # Much cheaper
)

complex_agent = AssistantAgent(
    name="ComplexAgent",
    llm_config={"model": "gpt-4"}  # Only when needed
)

3. Cache Responses

python

from functools import lru_cache

@lru_cache(maxsize=100)
def cached_agent_call(message: str):
    """Cache repeated queries"""
    result = executor.initiate_chat(agent, message=message)
    return result

# Repeated calls are cached
result1 = cached_agent_call("What is Python?")  # LLM call
result2 = cached_agent_call("What is Python?")  # Cached, no LLM call

4. Monitor Token Usage

python

import tiktoken

def count_tokens(text, model="gpt-4"):
    """Count tokens in text"""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Track cost
total_tokens = 0

def track_conversation(messages):
    global total_tokens
    for msg in messages:
        total_tokens += count_tokens(msg["content"])
    
    estimated_cost = total_tokens / 1000 * 0.03  # $0.03 per 1K tokens (GPT-4)
    print(f"Tokens used: {total_tokens}, Estimated cost: ${estimated_cost:.2f}")

Common Pitfalls

Pitfall #1: Infinite Loops

❌ Problem:

python

# No termination condition
agent1 = AssistantAgent(name="Agent1", llm_config=llm_config)
agent2 = AssistantAgent(name="Agent2", llm_config=llm_config)

# They talk forever!
agent1.initiate_chat(agent2, message="Discuss AI ethics")

✅ Solution:

python

# Always set max rounds
executor = UserProxyAgent(
    name="Executor",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", "")
)

Pitfall #2: Vague System Messages

❌ Problem:

python

agent = AssistantAgent(
    name="Agent",
    system_message="You are helpful."  # Too vague!
)

✅ Solution:

python

agent = AssistantAgent(
    name="Agent",
    system_message="""You are a Python developer.

    Your specific responsibilities:
    1. Write clean, documented code
    2. Include error handling
    3. Add type hints
    4. Write unit tests
    
    When you complete a task, say TERMINATE."""
)

Pitfall #3: Not Handling Code Execution Errors

❌ Problem:

python

# Code execution enabled, but no error handling
executor = UserProxyAgent(
    name="Executor",
    code_execution_config={"work_dir": "code"}
)
# Agent writes code with syntax errors -> crashes

✅ Solution:

python

executor = UserProxyAgent(
    name="Executor",
    code_execution_config={
        "work_dir": "code",
        "use_docker": True,  # Isolate execution
        "timeout": 60,  # Timeout after 60 seconds
        "last_n_messages": 3  # Only execute recent code
    }
)

Pitfall #4: Cost Explosion

❌ Problem:

python

# Group chat with 10 agents, unlimited rounds
groupchat = GroupChat(
    agents=[agent1, agent2, agent3, ..., agent10],
    max_round=100  # 10 agents * 100 rounds = $$$$
)

✅ Solution:

python

# Limit rounds and use cheaper models
groupchat = GroupChat(
    agents=[agent1, agent2, agent3],  # Fewer agents
    max_round=10  # Fewer rounds
)

# Mix expensive and cheap models
expensive_agent = AssistantAgent(llm_config={"model": "gpt-4"})
cheap_agent = AssistantAgent(llm_config={"model": "gpt-3.5-turbo"})

When to Use AutoGen

Good use cases:

✅ Code generation with debugging
✅ Research and analysis tasks
✅ Content creation with review cycles
✅ Multi-step reasoning problems
✅ Tasks requiring iteration

Not ideal for:

❌ Simple, single-step tasks (just use one LLM call)
❌ Real-time applications (conversations take time)
❌ Cost-sensitive applications (can get expensive)
❌ Deterministic workflows (use CrewAI or LangGraph)

Conclusion

AutoGen excels at building AI applications that require iterative refinement and multi-perspective collaboration.

Key principles:

Clear agent roles - Specific responsibilities
Termination conditions - Prevent infinite loops
Cost management - Monitor token usage
Error handling - Graceful failures
Human oversight - For critical decisions

When you need:

Iteration and refinement
Multiple perspectives
Code generation with testing
Research and synthesis
Flexible conversation flow

AutoGen is the right choice.

Example implementation:

I've built multi-agent systems using AutoGen for automated code review and data analysis. Check out my projects:

GitHub: github.com/Shodexco

Questions? Let's connect:

Portfolio: jonathansodeke.framer.website
GitHub: github.com/Shodexco
LinkedIn: www.linkedin.com/in/jonathan-sodeke

Now go build autonomous agents. Let them do the iterative work.

About the Author

Jonathan Sodeke is a Data Engineer and ML Engineer who builds production AI systems with multi-agent frameworks. He specializes in AutoGen, CrewAI, and LangGraph for creating autonomous AI workflows.

When he's not debugging agent conversations at 2am, he's building AI systems and teaching others to orchestrate multiple AI agents effectively.

Portfolio: jonathansodeke.framer.website
GitHub: github.com/Shodexco
LinkedIn: www.linkedin.com/in/jonathan-sodeke

Building Autonomous AI Agents with AutoGen

Why AutoGen Over Simple LLM Calls

The Single-Prompt Problem

The AutoGen Solution

Core AutoGen Concepts

1. Agents

2. Conversations

3. Termination

Basic Pattern: Two-Agent Collaboration

Code Generation with Review

Advanced Pattern: Group Chat

Research Paper Writing System

Real-World Use Case: Data Analysis Assistant

Complete Implementation

Advanced Features

1. Human-in-the-Loop

2. Function Calling

3. Nested Chats

4. State Management

Production Patterns

Pattern 1: Retry with Escalation

Pattern 2: Consensus Building

Pattern 3: Pipeline with Validation

Performance and Cost Optimization

1. Limit Conversation Rounds

2. Use Cheaper Models for Simple Tasks

3. Cache Responses

4. Monitor Token Usage

Common Pitfalls

Pitfall #1: Infinite Loops

Pitfall #2: Vague System Messages

Pitfall #3: Not Handling Code Execution Errors

Pitfall #4: Cost Explosion

When to Use AutoGen

Conclusion

About the Author

Recent Posts

Sign Up To My Newsletter

Sign Up To My Newsletter

Sign Up To My Newsletter