Steps
Steps capture the internal processing of an AI agent, including reasoning (thinking), tool calls, and outputs. They’re optional but provide valuable insight for evaluation.
Structure
{
"steps": [
{
"model_name": "gpt-4",
"agent_name": "order_assistant",
"thinking": "User wants order status. I should look it up.",
"tool_call": {
"name": "lookup_order",
"arguments": { "order_id": "ORD-123" }
}
},
{
"model_name": "gpt-4",
"agent_name": "order_assistant",
"tool_result": {
"status": "shipped",
"tracking": "1Z999..."
}
},
{
"model_name": "gpt-4",
"agent_name": "order_assistant",
"thinking": "Order found. Let me tell the customer.",
"output_content": "Your order has shipped! Tracking: 1Z999..."
}
]
}
Note: The order of steps is automatically inferred from their position in the steps array (0-indexed). You don’t need to specify step_order.
Fields
model_name (optional)
The model used for this step. Defaults to "unknown".
"model_name": "gpt-4-turbo"
agent_name (optional)
The name of the agent that executed this step. Useful for multi-agent systems to identify which agent and which tools to reference.
"agent_name": "order_assistant"
thinking (optional)
The model’s internal reasoning or chain-of-thought.
"thinking": "The user is asking about their order. I should use the lookup_order tool to find the current status before responding."
Information about a tool invocation.
"tool_call": {
"name": "search_products",
"arguments": {
"query": "wireless mouse",
"category": "electronics"
}
}
The result returned from a tool execution. Can be an object or string.
"tool_result": {
"products": [
{ "id": "MOUSE-001", "name": "Pro Wireless Mouse", "price": 79.99 }
]
}
output_structured (optional)
Structured output data (JSON object).
output_content (optional)
The final text output from this step.
"output_content": "I found a great option - the Pro Wireless Mouse for $79.99!"
Content Requirement
Each step must have at least one content field: thinking, tool_call, tool_result, output_structured, or output_content.
Common Patterns
{
"steps": [
{
"model_name": "gpt-4",
"thinking": "User is greeting me. I should respond warmly.",
"output_content": "Hello! How can I help you today?"
}
]
}
{
"steps": [
{
"model_name": "gpt-4",
"thinking": "I need to look up the order status.",
"tool_call": {
"name": "lookup_order",
"arguments": { "order_id": "ORD-123" }
},
"tool_result": {
"status": "delivered",
"delivered_at": "2024-01-19"
}
},
{
"model_name": "gpt-4",
"thinking": "Order was delivered. I'll let the customer know.",
"output_content": "Great news! Your order was delivered on January 19th."
}
]
}
{
"steps": [
{
"thinking": "Need to check order first",
"tool_call": { "name": "lookup_order", "arguments": { "order_id": "ORD-123" } },
"tool_result": { "status": "pending", "amount": 49.99 }
},
{
"thinking": "Order qualifies for refund, processing now",
"tool_call": { "name": "process_refund", "arguments": { "order_id": "ORD-123", "amount": 49.99 } },
"tool_result": { "refund_id": "REF-789", "status": "processed" }
},
{
"thinking": "Refund processed successfully",
"output_content": "I've processed your refund of $49.99. Reference: REF-789"
}
]
}
Why Use Steps?
Steps enable more detailed evaluation:
- Reasoning Quality: Evaluate if the agent’s thinking is sound
- Tool Selection: Check if the right tools were used
- Error Handling: See how the agent responds to tool failures
- Efficiency: Measure unnecessary steps or redundant tool calls