Documentation Index Fetch the complete documentation index at: https://turnwise.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Advanced Metrics
Advanced metrics use template variables and structured outputs to create context-aware, powerful evaluations. This guide covers all advanced features.
Basic vs Advanced Metrics
Basic Metrics
Simple prompts without template variables:
Prompt: "Is this response helpful? Answer yes or no."
Direct evaluation
No context resolution
Faster execution
Limited context awareness
Advanced Metrics
Prompts with template variables:
Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG"
Context-aware evaluation
Template variable resolution
Richer context
More accurate evaluations
Use advanced metrics for better results - Template variables provide crucial context for accurate evaluation.
Template Variables
Template variables inject conversation data into your prompts. They’re written as @VARIABLE_NAME.
Quick Reference
Variable Level Type Description @HISTORYAll String Full conversation history (rolling summary if long) @GOALAll String User’s overall goal/intent @LIST_AGENTAll String Available agents with tools @MESSAGESConversation String All messages formatted @USER_MESSAGESConversation String All user messages only @ASSISTANT_MESSAGESConversation String All assistant messages only @FIRST_USER_MSGConversation String First user message @LAST_USER_MSGConversation String Last user message @LAST_ASSISTANT_MSGConversation String Last assistant message @PREVIOUS_USER_MSGMessage, Step String Previous user message @PREVIOUS_ASSISTANT_MSGMessage, Step String Previous assistant message @CURRENT_MESSAGEMessage, Step Object Current message (use .output or .role) @CURRENT_STEPSMessage String All steps in current message @CURRENT_STEPS_COUNTMessage String Number of steps in message @PREVIOUS_STEPStep Object Previous step (use .thinking, .tool_call, .tool_result) @CURRENT_STEPStep Object Current step (use .thinking, .tool_call, .tool_result, .output_content, .output_structured) @STEP_NUMBERStep String Current step position (1-indexed) @METRIC_PREVIOUS_RESULTMessage, Step String Previous evaluation result (sequential mode only)
Levels :
All = Available at conversation, message, and step levels
Conversation = Only available at conversation level
Message = Available at message and step levels
Step = Only available at step level
Conversation-Level Variables
Available for conversation-level evaluations:
@HISTORY
Full conversation history (or rolling summary if long):
Prompt: "Evaluate conversation quality: @HISTORY"
Resolves to formatted conversation:
User: Hello, I need help
Assistant: Hi! How can I help?
User: I want to cancel my subscription
Assistant: I can help with that. Can you confirm your account email?
User: john@example.com
Assistant: I've cancelled your subscription. You'll receive a confirmation email shortly.
Note : For long conversations, TurnWise uses rolling summaries to keep context manageable while preserving important information.
@GOAL
User’s overall goal/intent (extracted from conversation):
Prompt: "Did the conversation achieve @GOAL?"
Resolves to extracted goal:
Note : Goals are automatically extracted from user messages using intent classification. The goal is cached per conversation for efficiency.
@LIST_AGENT
Available agents and their tools:
Prompt: "Evaluate tool usage given available tools: @LIST_AGENT"
Resolves to formatted list:
AVAILABLE TOOLS/AGENTS FOR THIS CONVERSATION:
## Agent: Support Agent
Description: Customer support agent
Tools:
- lookup_order: Look up order details
Parameters:
- order_id (string, required): Order identifier
- process_refund: Process refund
Parameters:
- order_id (string, required): Order identifier
- amount (number, required): Refund amount
@MESSAGES
All messages formatted:
Prompt: "Review all messages: @MESSAGES"
Resolves to:
[system]: You are a helpful customer service agent.
[user]: I need to cancel my subscription
[assistant]: I can help with that. Can you confirm your account email?
[user]: john@example.com
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.
@USER_MESSAGES
User messages only:
Prompt: "What did the user ask for? @USER_MESSAGES"
Resolves to:
[user]: I need to cancel my subscription
[user]: john@example.com
@ASSISTANT_MESSAGES
Assistant messages only:
Prompt: "Review assistant responses: @ASSISTANT_MESSAGES"
Resolves to:
[assistant]: I can help with that. Can you confirm your account email?
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.
@FIRST_USER_MSG
First user message:
Prompt: "Original request: @FIRST_USER_MSG"
Resolves to:
"I need to cancel my subscription"
@LAST_USER_MSG
Last user message:
Prompt: "Latest user message: @LAST_USER_MSG"
Resolves to:
@LAST_ASSISTANT_MSG
Last assistant message:
Prompt: "Latest response: @LAST_ASSISTANT_MSG"
Resolves to:
"I've cancelled your subscription. You'll receive a confirmation email shortly."
Message-Level Variables
Includes all conversation-level variables plus:
@PREVIOUS_USER_MSG
Previous user message:
Prompt: "Evaluate @CURRENT_MESSAGE.output given @PREVIOUS_USER_MSG"
@PREVIOUS_ASSISTANT_MSG
Previous assistant message:
Prompt: "Compare @CURRENT_MESSAGE.output to @PREVIOUS_ASSISTANT_MSG"
@CURRENT_MESSAGE
Current message being evaluated. Use with nested properties:
Prompt: "Current message: @CURRENT_MESSAGE"
Resolves to formatted message:
[assistant]: I can help you track your order. What's your order ID?
@CURRENT_MESSAGE.output
Current message content:
Prompt: "Evaluate: @CURRENT_MESSAGE.output"
Resolves to:
"I can help you track your order. What's your order ID?"
@CURRENT_MESSAGE.role
Current message role:
Prompt: "Message role: @CURRENT_MESSAGE.role"
Resolves to:
"assistant" (or "user", "system", "tool")
@CURRENT_STEPS
All steps in current message:
Prompt: "Review steps: @CURRENT_STEPS"
Resolves to formatted steps:
--- Step 1 ---
Thinking: User wants order status. I should look it up.
Tool Call: {
"name": "lookup_order",
"arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}
--- Step 2 ---
Thinking: Order found. Let me tell the customer.
Output: Your order has shipped! Tracking: 1Z999...
@CURRENT_STEPS_COUNT
Number of steps in message:
Prompt: "Message has @CURRENT_STEPS_COUNT steps"
Resolves to:
"2" (or "0", "1", "3", etc.)
Step-Level Variables
Includes all message-level variables plus:
@PREVIOUS_STEP
Previous step in the same message. Use with nested properties:
Prompt: "Previous step: @PREVIOUS_STEP"
Resolves to formatted step:
Thinking: Need to check order status first
Tool Call: {
"name": "lookup_order",
"arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}
@PREVIOUS_STEP.thinking
Previous step’s reasoning:
Prompt: "Given @PREVIOUS_STEP.thinking, evaluate @CURRENT_STEP.tool_call"
Resolves to:
"Need to check order status first"
Previous step’s tool call (JSON):
Prompt: "Previous tool: @PREVIOUS_STEP.tool_call"
Resolves to:
{
"name" : "lookup_order" ,
"arguments" : {
"order_id" : "ORD-123"
}
}
Previous step’s tool result (JSON):
Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call correct?"
Resolves to:
{
"status" : "shipped" ,
"tracking" : "1Z999AA10123456784"
}
@CURRENT_STEP
Current step being evaluated. Use with nested properties:
Prompt: "Current step: @CURRENT_STEP"
Resolves to formatted step:
Thinking: Order found. Let me tell the customer.
Tool Call: {
"name": "send_notification",
"arguments": {"message": "Your order shipped!"}
}
Output: Your order has shipped! Tracking: 1Z999...
@CURRENT_STEP.thinking
Current step’s reasoning:
Prompt: "Evaluate reasoning: @CURRENT_STEP.thinking"
Resolves to:
"Order found. Let me tell the customer."
Current step’s tool call (JSON):
Prompt: "Tool called: @CURRENT_STEP.tool_call"
Resolves to:
{
"name" : "lookup_order" ,
"arguments" : {
"order_id" : "ORD-123"
}
}
Current step’s tool result (JSON):
Prompt: "Tool result: @CURRENT_STEP.tool_result"
Resolves to:
{
"status" : "shipped" ,
"tracking" : "1Z999AA10123456784" ,
"estimated_delivery" : "2024-01-25"
}
@CURRENT_STEP.output_content
Current step’s output text:
Prompt: "Step output: @CURRENT_STEP.output_content"
Resolves to:
"Your order has shipped! Tracking number: 1Z999AA10123456784"
@CURRENT_STEP.output_structured
Current step’s structured output (JSON):
Prompt: "Structured output: @CURRENT_STEP.output_structured"
Resolves to:
{
"order_status" : "shipped" ,
"tracking_number" : "1Z999AA10123456784" ,
"estimated_delivery" : "2024-01-25"
}
@STEP_NUMBER
Step position (1-indexed):
Prompt: "Step @STEP_NUMBER: Evaluate @CURRENT_STEP.tool_call"
Resolves to:
"1" (for first step), "2" (for second step), etc.
Sequential Mode Variables
Available when using sequential execution mode (pipeline nodes with execution_mode: "sequential"):
@METRIC_PREVIOUS_RESULT
Previous evaluation result from the same pipeline execution. Only available in sequential mode:
Prompt: "Given previous result: @METRIC_PREVIOUS_RESULT, evaluate @CURRENT_MESSAGE.output"
Resolves to JSON of previous metric’s output:
{
"score" : 0.85 ,
"reasoning" : "Response was helpful but could be more concise"
}
Note : This variable is only available when:
Pipeline node has execution_mode: "sequential"
There is a previous metric result in the same execution
Evaluation level is message or step (not conversation)
Using Template Variables
Single Variable
Prompt: "Evaluate @CURRENT_MESSAGE.output"
Multiple Variables
Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG and @HISTORY"
Nested Context
Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call the correct next step? Consider @LIST_AGENT"
JSON Schema for Structured Outputs
When using output_type: "json", define a JSON schema:
Basic Schema
{
"type" : "object" ,
"properties" : {
"score" : {
"type" : "number" ,
"description" : "Quality score from 0-1"
},
"reasoning" : {
"type" : "string" ,
"description" : "Explanation of the score"
}
},
"required" : [ "score" , "reasoning" ]
}
Advanced Schema
{
"type" : "object" ,
"properties" : {
"helpfulness" : {
"type" : "number" ,
"description" : "Helpfulness score 0-1"
},
"accuracy" : {
"type" : "number" ,
"description" : "Accuracy score 0-1"
},
"tone" : {
"type" : "string" ,
"enum" : [ "polite" , "neutral" , "rude" ],
"description" : "Tone of the response"
},
"completeness" : {
"type" : "number" ,
"description" : "Completeness score 0-1"
},
"reasoning" : {
"type" : "string" ,
"description" : "Detailed explanation"
}
},
"required" : [ "helpfulness" , "accuracy" , "tone" , "completeness" , "reasoning" ]
}
Schema Best Practices
Include Score Field : Always have a primary metric
Add Reasoning : Include explanation field
Use Enums : For categorical values
Keep Simple : 2-5 fields typically sufficient
Describe Fields : Clear descriptions help LLM
Context Resolution
TurnWise resolves template variables in this order:
Fetch Conversation Data : Load from database
Extract Goals : If @GOAL needed, extract user goals
Create/Update Summary : If @HISTORY needed, manage rolling summary
Resolve Variables : Replace @VARIABLE_NAME with actual data
Build Prompt : Combine resolved variables with prompt text
Execute : Send to LLM
Example Advanced Metrics
Example 1: Context-Aware Helpfulness
Name: Context-Aware Helpfulness
Level: Message
Prompt: |
Evaluate @CURRENT_MESSAGE.output for helpfulness.
Context:
- User asked: @PREVIOUS_USER_MSG
- Conversation history: @HISTORY
Consider:
- Does it address the user's question?
- Is it accurate?
- Is it complete?
Provide a score from 0-1.
Output Type: Progress
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG, @HISTORY
Name: Tool Chain Correctness
Level: Step
Prompt: |
Evaluate if @CURRENT_STEP.tool_call is the correct next step.
Context:
- Previous tool result: @PREVIOUS_STEP.tool_result
- Available tools: @LIST_AGENT
- Step reasoning: @CURRENT_STEP.thinking
Answer yes or no.
Output Type: Checkbox
Template Variables: @CURRENT_STEP.tool_call, @PREVIOUS_STEP.tool_result, @LIST_AGENT, @CURRENT_STEP.thinking
Example 3: Multi-Dimensional Analysis
Name: Comprehensive Quality Analysis
Level: Message
Prompt: |
Analyze @CURRENT_MESSAGE.output across multiple dimensions:
- Helpfulness: Does it help the user?
- Accuracy: Is the information correct?
- Tone: Is the tone appropriate?
- Completeness: Does it fully address the question?
Context: @PREVIOUS_USER_MSG
Output Type: JSON
Schema: {
"type": "object",
"properties": {
"helpfulness": {"type": "number", "description": "0-1"},
"accuracy": {"type": "number", "description": "0-1"},
"tone": {"type": "string", "enum": ["polite", "neutral", "rude"]},
"completeness": {"type": "number", "description": "0-1"},
"reasoning": {"type": "string"}
},
"required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG
Variable Resolution Cost
Some variables require additional processing:
@GOAL : Requires goal extraction (cached per conversation)
@HISTORY : May require summary creation/update
@LIST_AGENT : Requires agent data loading
Optimization Tips
Reuse Variables : Multiple variables in one prompt = one resolution
Cache Goals : Goals are cached per conversation
Reuse Summaries : Summaries are reused across evaluations
Choose Right Level : Step-level is most granular (and most expensive)
Best Practices
Use Template Variables Always use variables for context-aware evaluation
Be Specific Specify what to evaluate and how
Provide Context Include relevant context variables
Test Schemas Test JSON schemas before running on all data
Next Steps
Creating Metrics Learn the basics of metric creation
Running Evaluations Run your advanced metrics