Advanced Metrics
Advanced metrics use template variables and structured outputs to create context-aware, powerful evaluations. This guide covers all advanced features.
Basic vs Advanced Metrics
Basic Metrics
Simple prompts without template variables:
Prompt: "Is this response helpful? Answer yes or no."
Direct evaluation
No context resolution
Faster execution
Limited context awareness
Advanced Metrics
Prompts with template variables:
Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG"
Context-aware evaluation
Template variable resolution
Richer context
More accurate evaluations
Use advanced metrics for better results - Template variables provide crucial context for accurate evaluation.
Template Variables
Template variables inject conversation data into your prompts. They’re written as @VARIABLE_NAME.
Quick Reference
Variable Level Type Description @HISTORYAll String Full conversation history (rolling summary if long) @GOALAll String User’s overall goal/intent @LIST_AGENTAll String Available agents with tools @MESSAGESConversation String All messages formatted @USER_MESSAGESConversation String All user messages only @ASSISTANT_MESSAGESConversation String All assistant messages only @FIRST_USER_MSGConversation String First user message @LAST_USER_MSGConversation String Last user message @LAST_ASSISTANT_MSGConversation String Last assistant message @PREVIOUS_USER_MSGMessage, Step String Previous user message @PREVIOUS_ASSISTANT_MSGMessage, Step String Previous assistant message @CURRENT_MESSAGEMessage, Step Object Current message (use .output or .role) @CURRENT_STEPSMessage String All steps in current message @CURRENT_STEPS_COUNTMessage String Number of steps in message @PREVIOUS_STEPStep Object Previous step (use .thinking, .tool_call, .tool_result) @CURRENT_STEPStep Object Current step (use .thinking, .tool_call, .tool_result, .output_content, .output_structured) @STEP_NUMBERStep String Current step position (1-indexed) @METRIC_PREVIOUS_RESULTMessage, Step String Previous evaluation result (sequential mode only)
Levels :
All = Available at conversation, message, and step levels
Conversation = Only available at conversation level
Message = Available at message and step levels
Step = Only available at step level
Conversation-Level Variables
Available for conversation-level evaluations:
@HISTORY
Full conversation history (or rolling summary if long):
Prompt: "Evaluate conversation quality: @HISTORY"
Resolves to formatted conversation:
User: Hello, I need help
Assistant: Hi! How can I help?
User: I want to cancel my subscription
Assistant: I can help with that. Can you confirm your account email?
User: john@example.com
Assistant: I've cancelled your subscription. You'll receive a confirmation email shortly.
Note : For long conversations, TurnWise uses rolling summaries to keep context manageable while preserving important information.
@GOAL
User’s overall goal/intent (extracted from conversation):
Prompt: "Did the conversation achieve @GOAL?"
Resolves to extracted goal:
Note : Goals are automatically extracted from user messages using intent classification. The goal is cached per conversation for efficiency.
@LIST_AGENT
Available agents and their tools:
Prompt: "Evaluate tool usage given available tools: @LIST_AGENT"
Resolves to formatted list:
AVAILABLE TOOLS/AGENTS FOR THIS CONVERSATION:
## Agent: Support Agent
Description: Customer support agent
Tools:
- lookup_order: Look up order details
Parameters:
- order_id (string, required): Order identifier
- process_refund: Process refund
Parameters:
- order_id (string, required): Order identifier
- amount (number, required): Refund amount
@MESSAGES
All messages formatted:
Prompt: "Review all messages: @MESSAGES"
Resolves to:
[system]: You are a helpful customer service agent.
[user]: I need to cancel my subscription
[assistant]: I can help with that. Can you confirm your account email?
[user]: john@example.com
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.
@USER_MESSAGES
User messages only:
Prompt: "What did the user ask for? @USER_MESSAGES"
Resolves to:
[user]: I need to cancel my subscription
[user]: john@example.com
@ASSISTANT_MESSAGES
Assistant messages only:
Prompt: "Review assistant responses: @ASSISTANT_MESSAGES"
Resolves to:
[assistant]: I can help with that. Can you confirm your account email?
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.
@FIRST_USER_MSG
First user message:
Prompt: "Original request: @FIRST_USER_MSG"
Resolves to:
"I need to cancel my subscription"
@LAST_USER_MSG
Last user message:
Prompt: "Latest user message: @LAST_USER_MSG"
Resolves to:
@LAST_ASSISTANT_MSG
Last assistant message:
Prompt: "Latest response: @LAST_ASSISTANT_MSG"
Resolves to:
"I've cancelled your subscription. You'll receive a confirmation email shortly."
Message-Level Variables
Includes all conversation-level variables plus:
@PREVIOUS_USER_MSG
Previous user message:
Prompt: "Evaluate @CURRENT_MESSAGE.output given @PREVIOUS_USER_MSG"
@PREVIOUS_ASSISTANT_MSG
Previous assistant message:
Prompt: "Compare @CURRENT_MESSAGE.output to @PREVIOUS_ASSISTANT_MSG"
@CURRENT_MESSAGE
Current message being evaluated. Use with nested properties:
Prompt: "Current message: @CURRENT_MESSAGE"
Resolves to formatted message:
[assistant]: I can help you track your order. What's your order ID?
@CURRENT_MESSAGE.output
Current message content:
Prompt: "Evaluate: @CURRENT_MESSAGE.output"
Resolves to:
"I can help you track your order. What's your order ID?"
@CURRENT_MESSAGE.role
Current message role:
Prompt: "Message role: @CURRENT_MESSAGE.role"
Resolves to:
"assistant" (or "user", "system", "tool")
@CURRENT_STEPS
All steps in current message:
Prompt: "Review steps: @CURRENT_STEPS"
Resolves to formatted steps:
--- Step 1 ---
Thinking: User wants order status. I should look it up.
Tool Call: {
"name": "lookup_order",
"arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}
--- Step 2 ---
Thinking: Order found. Let me tell the customer.
Output: Your order has shipped! Tracking: 1Z999...
@CURRENT_STEPS_COUNT
Number of steps in message:
Prompt: "Message has @CURRENT_STEPS_COUNT steps"
Resolves to:
"2" (or "0", "1", "3", etc.)
Step-Level Variables
Includes all message-level variables plus:
@PREVIOUS_STEP
Previous step in the same message. Use with nested properties:
Prompt: "Previous step: @PREVIOUS_STEP"
Resolves to formatted step:
Thinking: Need to check order status first
Tool Call: {
"name": "lookup_order",
"arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}
@PREVIOUS_STEP.thinking
Previous step’s reasoning:
Prompt: "Given @PREVIOUS_STEP.thinking, evaluate @CURRENT_STEP.tool_call"
Resolves to:
"Need to check order status first"
Previous step’s tool call (JSON):
Prompt: "Previous tool: @PREVIOUS_STEP.tool_call"
Resolves to:
{
"name" : "lookup_order" ,
"arguments" : {
"order_id" : "ORD-123"
}
}
Previous step’s tool result (JSON):
Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call correct?"
Resolves to:
{
"status" : "shipped" ,
"tracking" : "1Z999AA10123456784"
}
@CURRENT_STEP
Current step being evaluated. Use with nested properties:
Prompt: "Current step: @CURRENT_STEP"
Resolves to formatted step:
Thinking: Order found. Let me tell the customer.
Tool Call: {
"name": "send_notification",
"arguments": {"message": "Your order shipped!"}
}
Output: Your order has shipped! Tracking: 1Z999...
@CURRENT_STEP.thinking
Current step’s reasoning:
Prompt: "Evaluate reasoning: @CURRENT_STEP.thinking"
Resolves to:
"Order found. Let me tell the customer."
Current step’s tool call (JSON):
Prompt: "Tool called: @CURRENT_STEP.tool_call"
Resolves to:
{
"name" : "lookup_order" ,
"arguments" : {
"order_id" : "ORD-123"
}
}
Current step’s tool result (JSON):
Prompt: "Tool result: @CURRENT_STEP.tool_result"
Resolves to:
{
"status" : "shipped" ,
"tracking" : "1Z999AA10123456784" ,
"estimated_delivery" : "2024-01-25"
}
@CURRENT_STEP.output_content
Current step’s output text:
Prompt: "Step output: @CURRENT_STEP.output_content"
Resolves to:
"Your order has shipped! Tracking number: 1Z999AA10123456784"
@CURRENT_STEP.output_structured
Current step’s structured output (JSON):
Prompt: "Structured output: @CURRENT_STEP.output_structured"
Resolves to:
{
"order_status" : "shipped" ,
"tracking_number" : "1Z999AA10123456784" ,
"estimated_delivery" : "2024-01-25"
}
@STEP_NUMBER
Step position (1-indexed):
Prompt: "Step @STEP_NUMBER: Evaluate @CURRENT_STEP.tool_call"
Resolves to:
"1" (for first step), "2" (for second step), etc.
Sequential Mode Variables
Available when using sequential execution mode (pipeline nodes with execution_mode: "sequential"):
@METRIC_PREVIOUS_RESULT
Previous evaluation result from the same pipeline execution. Only available in sequential mode:
Prompt: "Given previous result: @METRIC_PREVIOUS_RESULT, evaluate @CURRENT_MESSAGE.output"
Resolves to JSON of previous metric’s output:
{
"score" : 0.85 ,
"reasoning" : "Response was helpful but could be more concise"
}
Note : This variable is only available when:
Pipeline node has execution_mode: "sequential"
There is a previous metric result in the same execution
Evaluation level is message or step (not conversation)
Using Template Variables
Single Variable
Prompt: "Evaluate @CURRENT_MESSAGE.output"
Multiple Variables
Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG and @HISTORY"
Nested Context
Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call the correct next step? Consider @LIST_AGENT"
JSON Schema for Structured Outputs
When using output_type: "json", define a JSON schema:
Basic Schema
{
"type" : "object" ,
"properties" : {
"score" : {
"type" : "number" ,
"description" : "Quality score from 0-1"
},
"reasoning" : {
"type" : "string" ,
"description" : "Explanation of the score"
}
},
"required" : [ "score" , "reasoning" ]
}
Advanced Schema
{
"type" : "object" ,
"properties" : {
"helpfulness" : {
"type" : "number" ,
"description" : "Helpfulness score 0-1"
},
"accuracy" : {
"type" : "number" ,
"description" : "Accuracy score 0-1"
},
"tone" : {
"type" : "string" ,
"enum" : [ "polite" , "neutral" , "rude" ],
"description" : "Tone of the response"
},
"completeness" : {
"type" : "number" ,
"description" : "Completeness score 0-1"
},
"reasoning" : {
"type" : "string" ,
"description" : "Detailed explanation"
}
},
"required" : [ "helpfulness" , "accuracy" , "tone" , "completeness" , "reasoning" ]
}
Schema Best Practices
Include Score Field : Always have a primary metric
Add Reasoning : Include explanation field
Use Enums : For categorical values
Keep Simple : 2-5 fields typically sufficient
Describe Fields : Clear descriptions help LLM
Context Resolution
TurnWise resolves template variables in this order:
Fetch Conversation Data : Load from database
Extract Goals : If @GOAL needed, extract user goals
Create/Update Summary : If @HISTORY needed, manage rolling summary
Resolve Variables : Replace @VARIABLE_NAME with actual data
Build Prompt : Combine resolved variables with prompt text
Execute : Send to LLM
Example Advanced Metrics
Example 1: Context-Aware Helpfulness
Name: Context-Aware Helpfulness
Level: Message
Prompt: |
Evaluate @CURRENT_MESSAGE.output for helpfulness.
Context:
- User asked: @PREVIOUS_USER_MSG
- Conversation history: @HISTORY
Consider:
- Does it address the user's question?
- Is it accurate?
- Is it complete?
Provide a score from 0-1.
Output Type: Progress
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG, @HISTORY
Name: Tool Chain Correctness
Level: Step
Prompt: |
Evaluate if @CURRENT_STEP.tool_call is the correct next step.
Context:
- Previous tool result: @PREVIOUS_STEP.tool_result
- Available tools: @LIST_AGENT
- Step reasoning: @CURRENT_STEP.thinking
Answer yes or no.
Output Type: Checkbox
Template Variables: @CURRENT_STEP.tool_call, @PREVIOUS_STEP.tool_result, @LIST_AGENT, @CURRENT_STEP.thinking
Example 3: Multi-Dimensional Analysis
Name: Comprehensive Quality Analysis
Level: Message
Prompt: |
Analyze @CURRENT_MESSAGE.output across multiple dimensions:
- Helpfulness: Does it help the user?
- Accuracy: Is the information correct?
- Tone: Is the tone appropriate?
- Completeness: Does it fully address the question?
Context: @PREVIOUS_USER_MSG
Output Type: JSON
Schema: {
"type": "object",
"properties": {
"helpfulness": {"type": "number", "description": "0-1"},
"accuracy": {"type": "number", "description": "0-1"},
"tone": {"type": "string", "enum": ["polite", "neutral", "rude"]},
"completeness": {"type": "number", "description": "0-1"},
"reasoning": {"type": "string"}
},
"required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG
Variable Resolution Cost
Some variables require additional processing:
@GOAL : Requires goal extraction (cached per conversation)
@HISTORY : May require summary creation/update
@LIST_AGENT : Requires agent data loading
Optimization Tips
Reuse Variables : Multiple variables in one prompt = one resolution
Cache Goals : Goals are cached per conversation
Reuse Summaries : Summaries are reused across evaluations
Choose Right Level : Step-level is most granular (and most expensive)
Best Practices
Use Template Variables Always use variables for context-aware evaluation
Be Specific Specify what to evaluate and how
Provide Context Include relevant context variables
Test Schemas Test JSON schemas before running on all data
Next Steps