Skip to main content

Advanced Metrics

Advanced metrics use template variables and structured outputs to create context-aware, powerful evaluations. This guide covers all advanced features.

Basic vs Advanced Metrics

Basic Metrics

Simple prompts without template variables:
Prompt: "Is this response helpful? Answer yes or no."
  • Direct evaluation
  • No context resolution
  • Faster execution
  • Limited context awareness

Advanced Metrics

Prompts with template variables:
Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG"
  • Context-aware evaluation
  • Template variable resolution
  • Richer context
  • More accurate evaluations
Use advanced metrics for better results - Template variables provide crucial context for accurate evaluation.

Template Variables

Template variables inject conversation data into your prompts. They’re written as @VARIABLE_NAME.

Quick Reference

VariableLevelTypeDescription
@HISTORYAllStringFull conversation history (rolling summary if long)
@GOALAllStringUser’s overall goal/intent
@LIST_AGENTAllStringAvailable agents with tools
@MESSAGESConversationStringAll messages formatted
@USER_MESSAGESConversationStringAll user messages only
@ASSISTANT_MESSAGESConversationStringAll assistant messages only
@FIRST_USER_MSGConversationStringFirst user message
@LAST_USER_MSGConversationStringLast user message
@LAST_ASSISTANT_MSGConversationStringLast assistant message
@PREVIOUS_USER_MSGMessage, StepStringPrevious user message
@PREVIOUS_ASSISTANT_MSGMessage, StepStringPrevious assistant message
@CURRENT_MESSAGEMessage, StepObjectCurrent message (use .output or .role)
@CURRENT_STEPSMessageStringAll steps in current message
@CURRENT_STEPS_COUNTMessageStringNumber of steps in message
@PREVIOUS_STEPStepObjectPrevious step (use .thinking, .tool_call, .tool_result)
@CURRENT_STEPStepObjectCurrent step (use .thinking, .tool_call, .tool_result, .output_content, .output_structured)
@STEP_NUMBERStepStringCurrent step position (1-indexed)
@METRIC_PREVIOUS_RESULTMessage, StepStringPrevious evaluation result (sequential mode only)
Levels:
  • All = Available at conversation, message, and step levels
  • Conversation = Only available at conversation level
  • Message = Available at message and step levels
  • Step = Only available at step level

Conversation-Level Variables

Available for conversation-level evaluations:

@HISTORY

Full conversation history (or rolling summary if long):
Prompt: "Evaluate conversation quality: @HISTORY"
Resolves to formatted conversation:
User: Hello, I need help
Assistant: Hi! How can I help?
User: I want to cancel my subscription
Assistant: I can help with that. Can you confirm your account email?
User: john@example.com
Assistant: I've cancelled your subscription. You'll receive a confirmation email shortly.
Note: For long conversations, TurnWise uses rolling summaries to keep context manageable while preserving important information.

@GOAL

User’s overall goal/intent (extracted from conversation):
Prompt: "Did the conversation achieve @GOAL?"
Resolves to extracted goal:
"Cancel subscription"
Note: Goals are automatically extracted from user messages using intent classification. The goal is cached per conversation for efficiency.

@LIST_AGENT

Available agents and their tools:
Prompt: "Evaluate tool usage given available tools: @LIST_AGENT"
Resolves to formatted list:
AVAILABLE TOOLS/AGENTS FOR THIS CONVERSATION:

## Agent: Support Agent
Description: Customer support agent
Tools:
  - lookup_order: Look up order details
    Parameters:
    - order_id (string, required): Order identifier
  - process_refund: Process refund
    Parameters:
    - order_id (string, required): Order identifier
    - amount (number, required): Refund amount

@MESSAGES

All messages formatted:
Prompt: "Review all messages: @MESSAGES"
Resolves to:
[system]: You are a helpful customer service agent.
[user]: I need to cancel my subscription
[assistant]: I can help with that. Can you confirm your account email?
[user]: john@example.com
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.

@USER_MESSAGES

User messages only:
Prompt: "What did the user ask for? @USER_MESSAGES"
Resolves to:
[user]: I need to cancel my subscription
[user]: john@example.com

@ASSISTANT_MESSAGES

Assistant messages only:
Prompt: "Review assistant responses: @ASSISTANT_MESSAGES"
Resolves to:
[assistant]: I can help with that. Can you confirm your account email?
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.

@FIRST_USER_MSG

First user message:
Prompt: "Original request: @FIRST_USER_MSG"
Resolves to:
"I need to cancel my subscription"

@LAST_USER_MSG

Last user message:
Prompt: "Latest user message: @LAST_USER_MSG"
Resolves to:
"john@example.com"

@LAST_ASSISTANT_MSG

Last assistant message:
Prompt: "Latest response: @LAST_ASSISTANT_MSG"
Resolves to:
"I've cancelled your subscription. You'll receive a confirmation email shortly."

Message-Level Variables

Includes all conversation-level variables plus:

@PREVIOUS_USER_MSG

Previous user message:
Prompt: "Evaluate @CURRENT_MESSAGE.output given @PREVIOUS_USER_MSG"

@PREVIOUS_ASSISTANT_MSG

Previous assistant message:
Prompt: "Compare @CURRENT_MESSAGE.output to @PREVIOUS_ASSISTANT_MSG"

@CURRENT_MESSAGE

Current message being evaluated. Use with nested properties:
Prompt: "Current message: @CURRENT_MESSAGE"
Resolves to formatted message:
[assistant]: I can help you track your order. What's your order ID?

@CURRENT_MESSAGE.output

Current message content:
Prompt: "Evaluate: @CURRENT_MESSAGE.output"
Resolves to:
"I can help you track your order. What's your order ID?"

@CURRENT_MESSAGE.role

Current message role:
Prompt: "Message role: @CURRENT_MESSAGE.role"
Resolves to:
"assistant"  (or "user", "system", "tool")

@CURRENT_STEPS

All steps in current message:
Prompt: "Review steps: @CURRENT_STEPS"
Resolves to formatted steps:
--- Step 1 ---
Thinking: User wants order status. I should look it up.
Tool Call: {
  "name": "lookup_order",
  "arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}

--- Step 2 ---
Thinking: Order found. Let me tell the customer.
Output: Your order has shipped! Tracking: 1Z999...

@CURRENT_STEPS_COUNT

Number of steps in message:
Prompt: "Message has @CURRENT_STEPS_COUNT steps"
Resolves to:
"2"  (or "0", "1", "3", etc.)

Step-Level Variables

Includes all message-level variables plus:

@PREVIOUS_STEP

Previous step in the same message. Use with nested properties:
Prompt: "Previous step: @PREVIOUS_STEP"
Resolves to formatted step:
Thinking: Need to check order status first
Tool Call: {
  "name": "lookup_order",
  "arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}

@PREVIOUS_STEP.thinking

Previous step’s reasoning:
Prompt: "Given @PREVIOUS_STEP.thinking, evaluate @CURRENT_STEP.tool_call"
Resolves to:
"Need to check order status first"

@PREVIOUS_STEP.tool_call

Previous step’s tool call (JSON):
Prompt: "Previous tool: @PREVIOUS_STEP.tool_call"
Resolves to:
{
  "name": "lookup_order",
  "arguments": {
    "order_id": "ORD-123"
  }
}

@PREVIOUS_STEP.tool_result

Previous step’s tool result (JSON):
Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call correct?"
Resolves to:
{
  "status": "shipped",
  "tracking": "1Z999AA10123456784"
}

@CURRENT_STEP

Current step being evaluated. Use with nested properties:
Prompt: "Current step: @CURRENT_STEP"
Resolves to formatted step:
Thinking: Order found. Let me tell the customer.
Tool Call: {
  "name": "send_notification",
  "arguments": {"message": "Your order shipped!"}
}
Output: Your order has shipped! Tracking: 1Z999...

@CURRENT_STEP.thinking

Current step’s reasoning:
Prompt: "Evaluate reasoning: @CURRENT_STEP.thinking"
Resolves to:
"Order found. Let me tell the customer."

@CURRENT_STEP.tool_call

Current step’s tool call (JSON):
Prompt: "Tool called: @CURRENT_STEP.tool_call"
Resolves to:
{
  "name": "lookup_order",
  "arguments": {
    "order_id": "ORD-123"
  }
}

@CURRENT_STEP.tool_result

Current step’s tool result (JSON):
Prompt: "Tool result: @CURRENT_STEP.tool_result"
Resolves to:
{
  "status": "shipped",
  "tracking": "1Z999AA10123456784",
  "estimated_delivery": "2024-01-25"
}

@CURRENT_STEP.output_content

Current step’s output text:
Prompt: "Step output: @CURRENT_STEP.output_content"
Resolves to:
"Your order has shipped! Tracking number: 1Z999AA10123456784"

@CURRENT_STEP.output_structured

Current step’s structured output (JSON):
Prompt: "Structured output: @CURRENT_STEP.output_structured"
Resolves to:
{
  "order_status": "shipped",
  "tracking_number": "1Z999AA10123456784",
  "estimated_delivery": "2024-01-25"
}

@STEP_NUMBER

Step position (1-indexed):
Prompt: "Step @STEP_NUMBER: Evaluate @CURRENT_STEP.tool_call"
Resolves to:
"1"  (for first step), "2" (for second step), etc.

Sequential Mode Variables

Available when using sequential execution mode (pipeline nodes with execution_mode: "sequential"):

@METRIC_PREVIOUS_RESULT

Previous evaluation result from the same pipeline execution. Only available in sequential mode:
Prompt: "Given previous result: @METRIC_PREVIOUS_RESULT, evaluate @CURRENT_MESSAGE.output"
Resolves to JSON of previous metric’s output:
{
  "score": 0.85,
  "reasoning": "Response was helpful but could be more concise"
}
Note: This variable is only available when:
  • Pipeline node has execution_mode: "sequential"
  • There is a previous metric result in the same execution
  • Evaluation level is message or step (not conversation)

Using Template Variables

Single Variable

Prompt: "Evaluate @CURRENT_MESSAGE.output"

Multiple Variables

Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG and @HISTORY"

Nested Context

Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call the correct next step? Consider @LIST_AGENT"

JSON Schema for Structured Outputs

When using output_type: "json", define a JSON schema:

Basic Schema

{
  "type": "object",
  "properties": {
    "score": {
      "type": "number",
      "description": "Quality score from 0-1"
    },
    "reasoning": {
      "type": "string",
      "description": "Explanation of the score"
    }
  },
  "required": ["score", "reasoning"]
}

Advanced Schema

{
  "type": "object",
  "properties": {
    "helpfulness": {
      "type": "number",
      "description": "Helpfulness score 0-1"
    },
    "accuracy": {
      "type": "number",
      "description": "Accuracy score 0-1"
    },
    "tone": {
      "type": "string",
      "enum": ["polite", "neutral", "rude"],
      "description": "Tone of the response"
    },
    "completeness": {
      "type": "number",
      "description": "Completeness score 0-1"
    },
    "reasoning": {
      "type": "string",
      "description": "Detailed explanation"
    }
  },
  "required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}

Schema Best Practices

  1. Include Score Field: Always have a primary metric
  2. Add Reasoning: Include explanation field
  3. Use Enums: For categorical values
  4. Keep Simple: 2-5 fields typically sufficient
  5. Describe Fields: Clear descriptions help LLM

Context Resolution

TurnWise resolves template variables in this order:
  1. Fetch Conversation Data: Load from database
  2. Extract Goals: If @GOAL needed, extract user goals
  3. Create/Update Summary: If @HISTORY needed, manage rolling summary
  4. Resolve Variables: Replace @VARIABLE_NAME with actual data
  5. Build Prompt: Combine resolved variables with prompt text
  6. Execute: Send to LLM

Example Advanced Metrics

Example 1: Context-Aware Helpfulness

Name: Context-Aware Helpfulness
Level: Message
Prompt: |
  Evaluate @CURRENT_MESSAGE.output for helpfulness.
  
  Context:
  - User asked: @PREVIOUS_USER_MSG
  - Conversation history: @HISTORY
  
  Consider:
  - Does it address the user's question?
  - Is it accurate?
  - Is it complete?
  
  Provide a score from 0-1.
Output Type: Progress
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG, @HISTORY

Example 2: Tool Chain Evaluation

Name: Tool Chain Correctness
Level: Step
Prompt: |
  Evaluate if @CURRENT_STEP.tool_call is the correct next step.
  
  Context:
  - Previous tool result: @PREVIOUS_STEP.tool_result
  - Available tools: @LIST_AGENT
  - Step reasoning: @CURRENT_STEP.thinking
  
  Answer yes or no.
Output Type: Checkbox
Template Variables: @CURRENT_STEP.tool_call, @PREVIOUS_STEP.tool_result, @LIST_AGENT, @CURRENT_STEP.thinking

Example 3: Multi-Dimensional Analysis

Name: Comprehensive Quality Analysis
Level: Message
Prompt: |
  Analyze @CURRENT_MESSAGE.output across multiple dimensions:
  - Helpfulness: Does it help the user?
  - Accuracy: Is the information correct?
  - Tone: Is the tone appropriate?
  - Completeness: Does it fully address the question?
  
  Context: @PREVIOUS_USER_MSG
Output Type: JSON
Schema: {
  "type": "object",
  "properties": {
    "helpfulness": {"type": "number", "description": "0-1"},
    "accuracy": {"type": "number", "description": "0-1"},
    "tone": {"type": "string", "enum": ["polite", "neutral", "rude"]},
    "completeness": {"type": "number", "description": "0-1"},
    "reasoning": {"type": "string"}
  },
  "required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG

Performance Considerations

Variable Resolution Cost

Some variables require additional processing:
  • @GOAL: Requires goal extraction (cached per conversation)
  • @HISTORY: May require summary creation/update
  • @LIST_AGENT: Requires agent data loading

Optimization Tips

  1. Reuse Variables: Multiple variables in one prompt = one resolution
  2. Cache Goals: Goals are cached per conversation
  3. Reuse Summaries: Summaries are reused across evaluations
  4. Choose Right Level: Step-level is most granular (and most expensive)

Best Practices

Use Template Variables

Always use variables for context-aware evaluation

Be Specific

Specify what to evaluate and how

Provide Context

Include relevant context variables

Test Schemas

Test JSON schemas before running on all data

Next Steps