Advanced Metrics

Advanced metrics use template variables and structured outputs to create context-aware, powerful evaluations. This guide covers all advanced features.

Basic vs Advanced Metrics

Basic Metrics

Simple prompts without template variables:

Prompt: "Is this response helpful? Answer yes or no."

Direct evaluation
No context resolution
Faster execution
Limited context awareness

Advanced Metrics

Prompts with template variables:

Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG"

Context-aware evaluation
Template variable resolution
Richer context
More accurate evaluations

Use advanced metrics for better results - Template variables provide crucial context for accurate evaluation.

Template Variables

Template variables inject conversation data into your prompts. They’re written as @VARIABLE_NAME.

Quick Reference

Variable	Level	Type	Description
`@HISTORY`	All	String	Full conversation history (rolling summary if long)
`@GOAL`	All	String	User’s overall goal/intent
`@LIST_AGENT`	All	String	Available agents with tools
`@MESSAGES`	Conversation	String	All messages formatted
`@USER_MESSAGES`	Conversation	String	All user messages only
`@ASSISTANT_MESSAGES`	Conversation	String	All assistant messages only
`@FIRST_USER_MSG`	Conversation	String	First user message
`@LAST_USER_MSG`	Conversation	String	Last user message
`@LAST_ASSISTANT_MSG`	Conversation	String	Last assistant message
`@PREVIOUS_USER_MSG`	Message, Step	String	Previous user message
`@PREVIOUS_ASSISTANT_MSG`	Message, Step	String	Previous assistant message
`@CURRENT_MESSAGE`	Message, Step	Object	Current message (use `.output` or `.role`)
`@CURRENT_STEPS`	Message	String	All steps in current message
`@CURRENT_STEPS_COUNT`	Message	String	Number of steps in message
`@PREVIOUS_STEP`	Step	Object	Previous step (use `.thinking`, `.tool_call`, `.tool_result`)
`@CURRENT_STEP`	Step	Object	Current step (use `.thinking`, `.tool_call`, `.tool_result`, `.output_content`, `.output_structured`)
`@STEP_NUMBER`	Step	String	Current step position (1-indexed)
`@METRIC_PREVIOUS_RESULT`	Message, Step	String	Previous evaluation result (sequential mode only)

Levels:

All = Available at conversation, message, and step levels
Conversation = Only available at conversation level
Message = Available at message and step levels
Step = Only available at step level

Conversation-Level Variables

Available for conversation-level evaluations:

@HISTORY

Full conversation history (or rolling summary if long):

Prompt: "Evaluate conversation quality: @HISTORY"

Resolves to formatted conversation:

User: Hello, I need help
Assistant: Hi! How can I help?
User: I want to cancel my subscription
Assistant: I can help with that. Can you confirm your account email?
User: john@example.com
Assistant: I've cancelled your subscription. You'll receive a confirmation email shortly.

Note: For long conversations, TurnWise uses rolling summaries to keep context manageable while preserving important information.

@GOAL

User’s overall goal/intent (extracted from conversation):

Prompt: "Did the conversation achieve @GOAL?"

Resolves to extracted goal:

"Cancel subscription"

Note: Goals are automatically extracted from user messages using intent classification. The goal is cached per conversation for efficiency.

@LIST_AGENT

Available agents and their tools:

Prompt: "Evaluate tool usage given available tools: @LIST_AGENT"

Resolves to formatted list:

AVAILABLE TOOLS/AGENTS FOR THIS CONVERSATION:

## Agent: Support Agent
Description: Customer support agent
Tools:
  - lookup_order: Look up order details
    Parameters:
    - order_id (string, required): Order identifier
  - process_refund: Process refund
    Parameters:
    - order_id (string, required): Order identifier
    - amount (number, required): Refund amount

@MESSAGES

All messages formatted:

Prompt: "Review all messages: @MESSAGES"

Resolves to:

[system]: You are a helpful customer service agent.
[user]: I need to cancel my subscription
[assistant]: I can help with that. Can you confirm your account email?
[user]: john@example.com
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.

@USER_MESSAGES

User messages only:

Prompt: "What did the user ask for? @USER_MESSAGES"

Resolves to:

[user]: I need to cancel my subscription
[user]: john@example.com

@ASSISTANT_MESSAGES

Assistant messages only:

Prompt: "Review assistant responses: @ASSISTANT_MESSAGES"

Resolves to:

[assistant]: I can help with that. Can you confirm your account email?
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.

@FIRST_USER_MSG

First user message:

Prompt: "Original request: @FIRST_USER_MSG"

Resolves to:

"I need to cancel my subscription"

@LAST_USER_MSG

Last user message:

Prompt: "Latest user message: @LAST_USER_MSG"

Resolves to:

"john@example.com"

@LAST_ASSISTANT_MSG

Last assistant message:

Prompt: "Latest response: @LAST_ASSISTANT_MSG"

Resolves to:

"I've cancelled your subscription. You'll receive a confirmation email shortly."

Message-Level Variables

Includes all conversation-level variables plus:

@PREVIOUS_USER_MSG

Previous user message:

Prompt: "Evaluate @CURRENT_MESSAGE.output given @PREVIOUS_USER_MSG"

@PREVIOUS_ASSISTANT_MSG

Previous assistant message:

Prompt: "Compare @CURRENT_MESSAGE.output to @PREVIOUS_ASSISTANT_MSG"

@CURRENT_MESSAGE

Current message being evaluated. Use with nested properties:

Prompt: "Current message: @CURRENT_MESSAGE"

Resolves to formatted message:

[assistant]: I can help you track your order. What's your order ID?

@CURRENT_MESSAGE.output

Current message content:

Prompt: "Evaluate: @CURRENT_MESSAGE.output"

Resolves to:

"I can help you track your order. What's your order ID?"

@CURRENT_MESSAGE.role

Current message role:

Prompt: "Message role: @CURRENT_MESSAGE.role"

Resolves to:

"assistant"  (or "user", "system", "tool")

@CURRENT_STEPS

All steps in current message:

Prompt: "Review steps: @CURRENT_STEPS"

Resolves to formatted steps:

--- Step 1 ---
Thinking: User wants order status. I should look it up.
Tool Call: {
  "name": "lookup_order",
  "arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}

--- Step 2 ---
Thinking: Order found. Let me tell the customer.
Output: Your order has shipped! Tracking: 1Z999...

@CURRENT_STEPS_COUNT

Number of steps in message:

Prompt: "Message has @CURRENT_STEPS_COUNT steps"

Resolves to:

"2"  (or "0", "1", "3", etc.)

Step-Level Variables

Includes all message-level variables plus:

@PREVIOUS_STEP

Previous step in the same message. Use with nested properties:

Prompt: "Previous step: @PREVIOUS_STEP"

Resolves to formatted step:

Thinking: Need to check order status first
Tool Call: {
  "name": "lookup_order",
  "arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}

@PREVIOUS_STEP.thinking

Previous step’s reasoning:

Prompt: "Given @PREVIOUS_STEP.thinking, evaluate @CURRENT_STEP.tool_call"

Resolves to:

"Need to check order status first"

@PREVIOUS_STEP.tool_call

Previous step’s tool call (JSON):

Prompt: "Previous tool: @PREVIOUS_STEP.tool_call"

Resolves to:

{
  "name": "lookup_order",
  "arguments": {
    "order_id": "ORD-123"
  }
}

@PREVIOUS_STEP.tool_result

Previous step’s tool result (JSON):

Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call correct?"

Resolves to:

{
  "status": "shipped",
  "tracking": "1Z999AA10123456784"
}

@CURRENT_STEP

Current step being evaluated. Use with nested properties:

Prompt: "Current step: @CURRENT_STEP"

Resolves to formatted step:

Thinking: Order found. Let me tell the customer.
Tool Call: {
  "name": "send_notification",
  "arguments": {"message": "Your order shipped!"}
}
Output: Your order has shipped! Tracking: 1Z999...

@CURRENT_STEP.thinking

Current step’s reasoning:

Prompt: "Evaluate reasoning: @CURRENT_STEP.thinking"

Resolves to:

"Order found. Let me tell the customer."

@CURRENT_STEP.tool_call

Current step’s tool call (JSON):

Prompt: "Tool called: @CURRENT_STEP.tool_call"

Resolves to:

{
  "name": "lookup_order",
  "arguments": {
    "order_id": "ORD-123"
  }
}

@CURRENT_STEP.tool_result

Current step’s tool result (JSON):

Prompt: "Tool result: @CURRENT_STEP.tool_result"

Resolves to:

{
  "status": "shipped",
  "tracking": "1Z999AA10123456784",
  "estimated_delivery": "2024-01-25"
}

@CURRENT_STEP.output_content

Current step’s output text:

Prompt: "Step output: @CURRENT_STEP.output_content"

Resolves to:

"Your order has shipped! Tracking number: 1Z999AA10123456784"

@CURRENT_STEP.output_structured

Current step’s structured output (JSON):

Prompt: "Structured output: @CURRENT_STEP.output_structured"

Resolves to:

{
  "order_status": "shipped",
  "tracking_number": "1Z999AA10123456784",
  "estimated_delivery": "2024-01-25"
}

@STEP_NUMBER

Step position (1-indexed):

Prompt: "Step @STEP_NUMBER: Evaluate @CURRENT_STEP.tool_call"

Resolves to:

"1"  (for first step), "2" (for second step), etc.

Sequential Mode Variables

Available when using sequential execution mode (pipeline nodes with execution_mode: "sequential"):

@METRIC_PREVIOUS_RESULT

Previous evaluation result from the same pipeline execution. Only available in sequential mode:

Prompt: "Given previous result: @METRIC_PREVIOUS_RESULT, evaluate @CURRENT_MESSAGE.output"

Resolves to JSON of previous metric’s output:

{
  "score": 0.85,
  "reasoning": "Response was helpful but could be more concise"
}

Note: This variable is only available when:

Pipeline node has execution_mode: "sequential"
There is a previous metric result in the same execution
Evaluation level is message or step (not conversation)

Using Template Variables

Single Variable

Prompt: "Evaluate @CURRENT_MESSAGE.output"

Multiple Variables

Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG and @HISTORY"

Nested Context

Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call the correct next step? Consider @LIST_AGENT"

JSON Schema for Structured Outputs

When using output_type: "json", define a JSON schema:

Basic Schema

{
  "type": "object",
  "properties": {
    "score": {
      "type": "number",
      "description": "Quality score from 0-1"
    },
    "reasoning": {
      "type": "string",
      "description": "Explanation of the score"
    }
  },
  "required": ["score", "reasoning"]
}

Advanced Schema

{
  "type": "object",
  "properties": {
    "helpfulness": {
      "type": "number",
      "description": "Helpfulness score 0-1"
    },
    "accuracy": {
      "type": "number",
      "description": "Accuracy score 0-1"
    },
    "tone": {
      "type": "string",
      "enum": ["polite", "neutral", "rude"],
      "description": "Tone of the response"
    },
    "completeness": {
      "type": "number",
      "description": "Completeness score 0-1"
    },
    "reasoning": {
      "type": "string",
      "description": "Detailed explanation"
    }
  },
  "required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}

Schema Best Practices

Include Score Field: Always have a primary metric
Add Reasoning: Include explanation field
Use Enums: For categorical values
Keep Simple: 2-5 fields typically sufficient
Describe Fields: Clear descriptions help LLM

Context Resolution

TurnWise resolves template variables in this order:

Fetch Conversation Data: Load from database
Extract Goals: If @GOAL needed, extract user goals
Create/Update Summary: If @HISTORY needed, manage rolling summary
Resolve Variables: Replace @VARIABLE_NAME with actual data
Build Prompt: Combine resolved variables with prompt text
Execute: Send to LLM

Example Advanced Metrics

Example 1: Context-Aware Helpfulness

Name: Context-Aware Helpfulness
Level: Message
Prompt: |
  Evaluate @CURRENT_MESSAGE.output for helpfulness.
  
  Context:
  - User asked: @PREVIOUS_USER_MSG
  - Conversation history: @HISTORY
  
  Consider:
  - Does it address the user's question?
  - Is it accurate?
  - Is it complete?
  
  Provide a score from 0-1.
Output Type: Progress
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG, @HISTORY

Example 2: Tool Chain Evaluation

Name: Tool Chain Correctness
Level: Step
Prompt: |
  Evaluate if @CURRENT_STEP.tool_call is the correct next step.
  
  Context:
  - Previous tool result: @PREVIOUS_STEP.tool_result
  - Available tools: @LIST_AGENT
  - Step reasoning: @CURRENT_STEP.thinking
  
  Answer yes or no.
Output Type: Checkbox
Template Variables: @CURRENT_STEP.tool_call, @PREVIOUS_STEP.tool_result, @LIST_AGENT, @CURRENT_STEP.thinking

Example 3: Multi-Dimensional Analysis

Name: Comprehensive Quality Analysis
Level: Message
Prompt: |
  Analyze @CURRENT_MESSAGE.output across multiple dimensions:
  - Helpfulness: Does it help the user?
  - Accuracy: Is the information correct?
  - Tone: Is the tone appropriate?
  - Completeness: Does it fully address the question?
  
  Context: @PREVIOUS_USER_MSG
Output Type: JSON
Schema: {
  "type": "object",
  "properties": {
    "helpfulness": {"type": "number", "description": "0-1"},
    "accuracy": {"type": "number", "description": "0-1"},
    "tone": {"type": "string", "enum": ["polite", "neutral", "rude"]},
    "completeness": {"type": "number", "description": "0-1"},
    "reasoning": {"type": "string"}
  },
  "required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG

Performance Considerations

Variable Resolution Cost

Some variables require additional processing:

@GOAL: Requires goal extraction (cached per conversation)
@HISTORY: May require summary creation/update
@LIST_AGENT: Requires agent data loading

Optimization Tips

Reuse Variables: Multiple variables in one prompt = one resolution
Cache Goals: Goals are cached per conversation
Reuse Summaries: Summaries are reused across evaluations
Choose Right Level: Step-level is most granular (and most expensive)

Best Practices

Use Template Variables

Always use variables for context-aware evaluation

Be Specific

Specify what to evaluate and how

Provide Context

Include relevant context variables

Test Schemas

Test JSON schemas before running on all data

Next Steps

Creating Metrics

Learn the basics of metric creation

Running Evaluations

Run your advanced metrics

Getting Started

Data Format

Datasets

Metrics

Evaluation

Examples

Python SDK

​Advanced Metrics

​Basic vs Advanced Metrics

​Basic Metrics

​Advanced Metrics

​Template Variables

​Quick Reference

​Conversation-Level Variables

​@HISTORY

​@GOAL

​@LIST_AGENT

​@MESSAGES

​@USER_MESSAGES

​@ASSISTANT_MESSAGES

​@FIRST_USER_MSG

​@LAST_USER_MSG

​@LAST_ASSISTANT_MSG

​Message-Level Variables

​@PREVIOUS_USER_MSG

​@PREVIOUS_ASSISTANT_MSG

​@CURRENT_MESSAGE

​@CURRENT_MESSAGE.output

​@CURRENT_MESSAGE.role

​@CURRENT_STEPS

​@CURRENT_STEPS_COUNT

​Step-Level Variables

​@PREVIOUS_STEP

​@PREVIOUS_STEP.thinking

​@PREVIOUS_STEP.tool_call

​@PREVIOUS_STEP.tool_result

​@CURRENT_STEP

​@CURRENT_STEP.thinking

​@CURRENT_STEP.tool_call

​@CURRENT_STEP.tool_result

​@CURRENT_STEP.output_content

​@CURRENT_STEP.output_structured

​@STEP_NUMBER

​Sequential Mode Variables

​@METRIC_PREVIOUS_RESULT

​Using Template Variables

​Single Variable

​Multiple Variables

​Nested Context

​JSON Schema for Structured Outputs

​Basic Schema

​Advanced Schema

​Schema Best Practices

​Context Resolution

​Example Advanced Metrics

​Example 1: Context-Aware Helpfulness

​Example 2: Tool Chain Evaluation

​Example 3: Multi-Dimensional Analysis

​Performance Considerations

​Variable Resolution Cost

​Optimization Tips

​Best Practices

Use Template Variables

Be Specific

Provide Context

Test Schemas

​Next Steps

Creating Metrics

Running Evaluations

Advanced Metrics

Basic vs Advanced Metrics

Basic Metrics

Advanced Metrics

Template Variables

Quick Reference

Conversation-Level Variables

@HISTORY

@GOAL

@LIST_AGENT

@MESSAGES

@USER_MESSAGES

@ASSISTANT_MESSAGES

@FIRST_USER_MSG

@LAST_USER_MSG

@LAST_ASSISTANT_MSG

Message-Level Variables

@PREVIOUS_USER_MSG

@PREVIOUS_ASSISTANT_MSG

@CURRENT_MESSAGE

@CURRENT_MESSAGE.output

@CURRENT_MESSAGE.role

@CURRENT_STEPS

@CURRENT_STEPS_COUNT

Step-Level Variables

@PREVIOUS_STEP

@PREVIOUS_STEP.thinking

@PREVIOUS_STEP.tool_call

@PREVIOUS_STEP.tool_result

@CURRENT_STEP

@CURRENT_STEP.thinking

@CURRENT_STEP.tool_call

@CURRENT_STEP.tool_result

@CURRENT_STEP.output_content

@CURRENT_STEP.output_structured

@STEP_NUMBER

Sequential Mode Variables

@METRIC_PREVIOUS_RESULT

Using Template Variables

Single Variable

Multiple Variables

Nested Context

JSON Schema for Structured Outputs

Basic Schema

Advanced Schema

Schema Best Practices

Context Resolution

Example Advanced Metrics

Example 1: Context-Aware Helpfulness

Example 2: Tool Chain Evaluation

Example 3: Multi-Dimensional Analysis

Performance Considerations

Variable Resolution Cost

Optimization Tips

Best Practices

Next Steps