Skip to main content

Documentation Index

Fetch the complete documentation index at: https://turnwise.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Advanced Metrics

Advanced metrics use template variables and structured outputs to create context-aware, powerful evaluations. This guide covers all advanced features.

Basic vs Advanced Metrics

Basic Metrics

Simple prompts without template variables:
Prompt: "Is this response helpful? Answer yes or no."
  • Direct evaluation
  • No context resolution
  • Faster execution
  • Limited context awareness

Advanced Metrics

Prompts with template variables:
Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG"
  • Context-aware evaluation
  • Template variable resolution
  • Richer context
  • More accurate evaluations
Use advanced metrics for better results - Template variables provide crucial context for accurate evaluation.

Template Variables

Template variables inject conversation data into your prompts. They’re written as @VARIABLE_NAME.

Quick Reference

VariableLevelTypeDescription
@HISTORYAllStringFull conversation history (rolling summary if long)
@GOALAllStringUser’s overall goal/intent
@LIST_AGENTAllStringAvailable agents with tools
@MESSAGESConversationStringAll messages formatted
@USER_MESSAGESConversationStringAll user messages only
@ASSISTANT_MESSAGESConversationStringAll assistant messages only
@FIRST_USER_MSGConversationStringFirst user message
@LAST_USER_MSGConversationStringLast user message
@LAST_ASSISTANT_MSGConversationStringLast assistant message
@PREVIOUS_USER_MSGMessage, StepStringPrevious user message
@PREVIOUS_ASSISTANT_MSGMessage, StepStringPrevious assistant message
@CURRENT_MESSAGEMessage, StepObjectCurrent message (use .output or .role)
@CURRENT_STEPSMessageStringAll steps in current message
@CURRENT_STEPS_COUNTMessageStringNumber of steps in message
@PREVIOUS_STEPStepObjectPrevious step (use .thinking, .tool_call, .tool_result)
@CURRENT_STEPStepObjectCurrent step (use .thinking, .tool_call, .tool_result, .output_content, .output_structured)
@STEP_NUMBERStepStringCurrent step position (1-indexed)
@METRIC_PREVIOUS_RESULTMessage, StepStringPrevious evaluation result (sequential mode only)
Levels:
  • All = Available at conversation, message, and step levels
  • Conversation = Only available at conversation level
  • Message = Available at message and step levels
  • Step = Only available at step level

Conversation-Level Variables

Available for conversation-level evaluations:

@HISTORY

Full conversation history (or rolling summary if long):
Prompt: "Evaluate conversation quality: @HISTORY"
Resolves to formatted conversation:
User: Hello, I need help
Assistant: Hi! How can I help?
User: I want to cancel my subscription
Assistant: I can help with that. Can you confirm your account email?
User: john@example.com
Assistant: I've cancelled your subscription. You'll receive a confirmation email shortly.
Note: For long conversations, TurnWise uses rolling summaries to keep context manageable while preserving important information.

@GOAL

User’s overall goal/intent (extracted from conversation):
Prompt: "Did the conversation achieve @GOAL?"
Resolves to extracted goal:
"Cancel subscription"
Note: Goals are automatically extracted from user messages using intent classification. The goal is cached per conversation for efficiency.

@LIST_AGENT

Available agents and their tools:
Prompt: "Evaluate tool usage given available tools: @LIST_AGENT"
Resolves to formatted list:
AVAILABLE TOOLS/AGENTS FOR THIS CONVERSATION:

## Agent: Support Agent
Description: Customer support agent
Tools:
  - lookup_order: Look up order details
    Parameters:
    - order_id (string, required): Order identifier
  - process_refund: Process refund
    Parameters:
    - order_id (string, required): Order identifier
    - amount (number, required): Refund amount

@MESSAGES

All messages formatted:
Prompt: "Review all messages: @MESSAGES"
Resolves to:
[system]: You are a helpful customer service agent.
[user]: I need to cancel my subscription
[assistant]: I can help with that. Can you confirm your account email?
[user]: john@example.com
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.

@USER_MESSAGES

User messages only:
Prompt: "What did the user ask for? @USER_MESSAGES"
Resolves to:
[user]: I need to cancel my subscription
[user]: john@example.com

@ASSISTANT_MESSAGES

Assistant messages only:
Prompt: "Review assistant responses: @ASSISTANT_MESSAGES"
Resolves to:
[assistant]: I can help with that. Can you confirm your account email?
[assistant]: I've cancelled your subscription. You'll receive a confirmation email shortly.

@FIRST_USER_MSG

First user message:
Prompt: "Original request: @FIRST_USER_MSG"
Resolves to:
"I need to cancel my subscription"

@LAST_USER_MSG

Last user message:
Prompt: "Latest user message: @LAST_USER_MSG"
Resolves to:
"john@example.com"

@LAST_ASSISTANT_MSG

Last assistant message:
Prompt: "Latest response: @LAST_ASSISTANT_MSG"
Resolves to:
"I've cancelled your subscription. You'll receive a confirmation email shortly."

Message-Level Variables

Includes all conversation-level variables plus:

@PREVIOUS_USER_MSG

Previous user message:
Prompt: "Evaluate @CURRENT_MESSAGE.output given @PREVIOUS_USER_MSG"

@PREVIOUS_ASSISTANT_MSG

Previous assistant message:
Prompt: "Compare @CURRENT_MESSAGE.output to @PREVIOUS_ASSISTANT_MSG"

@CURRENT_MESSAGE

Current message being evaluated. Use with nested properties:
Prompt: "Current message: @CURRENT_MESSAGE"
Resolves to formatted message:
[assistant]: I can help you track your order. What's your order ID?

@CURRENT_MESSAGE.output

Current message content:
Prompt: "Evaluate: @CURRENT_MESSAGE.output"
Resolves to:
"I can help you track your order. What's your order ID?"

@CURRENT_MESSAGE.role

Current message role:
Prompt: "Message role: @CURRENT_MESSAGE.role"
Resolves to:
"assistant"  (or "user", "system", "tool")

@CURRENT_STEPS

All steps in current message:
Prompt: "Review steps: @CURRENT_STEPS"
Resolves to formatted steps:
--- Step 1 ---
Thinking: User wants order status. I should look it up.
Tool Call: {
  "name": "lookup_order",
  "arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}

--- Step 2 ---
Thinking: Order found. Let me tell the customer.
Output: Your order has shipped! Tracking: 1Z999...

@CURRENT_STEPS_COUNT

Number of steps in message:
Prompt: "Message has @CURRENT_STEPS_COUNT steps"
Resolves to:
"2"  (or "0", "1", "3", etc.)

Step-Level Variables

Includes all message-level variables plus:

@PREVIOUS_STEP

Previous step in the same message. Use with nested properties:
Prompt: "Previous step: @PREVIOUS_STEP"
Resolves to formatted step:
Thinking: Need to check order status first
Tool Call: {
  "name": "lookup_order",
  "arguments": {"order_id": "ORD-123"}
}
Tool Result: {"status": "shipped"}

@PREVIOUS_STEP.thinking

Previous step’s reasoning:
Prompt: "Given @PREVIOUS_STEP.thinking, evaluate @CURRENT_STEP.tool_call"
Resolves to:
"Need to check order status first"

@PREVIOUS_STEP.tool_call

Previous step’s tool call (JSON):
Prompt: "Previous tool: @PREVIOUS_STEP.tool_call"
Resolves to:
{
  "name": "lookup_order",
  "arguments": {
    "order_id": "ORD-123"
  }
}

@PREVIOUS_STEP.tool_result

Previous step’s tool result (JSON):
Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call correct?"
Resolves to:
{
  "status": "shipped",
  "tracking": "1Z999AA10123456784"
}

@CURRENT_STEP

Current step being evaluated. Use with nested properties:
Prompt: "Current step: @CURRENT_STEP"
Resolves to formatted step:
Thinking: Order found. Let me tell the customer.
Tool Call: {
  "name": "send_notification",
  "arguments": {"message": "Your order shipped!"}
}
Output: Your order has shipped! Tracking: 1Z999...

@CURRENT_STEP.thinking

Current step’s reasoning:
Prompt: "Evaluate reasoning: @CURRENT_STEP.thinking"
Resolves to:
"Order found. Let me tell the customer."

@CURRENT_STEP.tool_call

Current step’s tool call (JSON):
Prompt: "Tool called: @CURRENT_STEP.tool_call"
Resolves to:
{
  "name": "lookup_order",
  "arguments": {
    "order_id": "ORD-123"
  }
}

@CURRENT_STEP.tool_result

Current step’s tool result (JSON):
Prompt: "Tool result: @CURRENT_STEP.tool_result"
Resolves to:
{
  "status": "shipped",
  "tracking": "1Z999AA10123456784",
  "estimated_delivery": "2024-01-25"
}

@CURRENT_STEP.output_content

Current step’s output text:
Prompt: "Step output: @CURRENT_STEP.output_content"
Resolves to:
"Your order has shipped! Tracking number: 1Z999AA10123456784"

@CURRENT_STEP.output_structured

Current step’s structured output (JSON):
Prompt: "Structured output: @CURRENT_STEP.output_structured"
Resolves to:
{
  "order_status": "shipped",
  "tracking_number": "1Z999AA10123456784",
  "estimated_delivery": "2024-01-25"
}

@STEP_NUMBER

Step position (1-indexed):
Prompt: "Step @STEP_NUMBER: Evaluate @CURRENT_STEP.tool_call"
Resolves to:
"1"  (for first step), "2" (for second step), etc.

Sequential Mode Variables

Available when using sequential execution mode (pipeline nodes with execution_mode: "sequential"):

@METRIC_PREVIOUS_RESULT

Previous evaluation result from the same pipeline execution. Only available in sequential mode:
Prompt: "Given previous result: @METRIC_PREVIOUS_RESULT, evaluate @CURRENT_MESSAGE.output"
Resolves to JSON of previous metric’s output:
{
  "score": 0.85,
  "reasoning": "Response was helpful but could be more concise"
}
Note: This variable is only available when:
  • Pipeline node has execution_mode: "sequential"
  • There is a previous metric result in the same execution
  • Evaluation level is message or step (not conversation)

Using Template Variables

Single Variable

Prompt: "Evaluate @CURRENT_MESSAGE.output"

Multiple Variables

Prompt: "Evaluate @CURRENT_MESSAGE.output for helpfulness given @PREVIOUS_USER_MSG and @HISTORY"

Nested Context

Prompt: "Given @PREVIOUS_STEP.tool_result, was @CURRENT_STEP.tool_call the correct next step? Consider @LIST_AGENT"

JSON Schema for Structured Outputs

When using output_type: "json", define a JSON schema:

Basic Schema

{
  "type": "object",
  "properties": {
    "score": {
      "type": "number",
      "description": "Quality score from 0-1"
    },
    "reasoning": {
      "type": "string",
      "description": "Explanation of the score"
    }
  },
  "required": ["score", "reasoning"]
}

Advanced Schema

{
  "type": "object",
  "properties": {
    "helpfulness": {
      "type": "number",
      "description": "Helpfulness score 0-1"
    },
    "accuracy": {
      "type": "number",
      "description": "Accuracy score 0-1"
    },
    "tone": {
      "type": "string",
      "enum": ["polite", "neutral", "rude"],
      "description": "Tone of the response"
    },
    "completeness": {
      "type": "number",
      "description": "Completeness score 0-1"
    },
    "reasoning": {
      "type": "string",
      "description": "Detailed explanation"
    }
  },
  "required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}

Schema Best Practices

  1. Include Score Field: Always have a primary metric
  2. Add Reasoning: Include explanation field
  3. Use Enums: For categorical values
  4. Keep Simple: 2-5 fields typically sufficient
  5. Describe Fields: Clear descriptions help LLM

Context Resolution

TurnWise resolves template variables in this order:
  1. Fetch Conversation Data: Load from database
  2. Extract Goals: If @GOAL needed, extract user goals
  3. Create/Update Summary: If @HISTORY needed, manage rolling summary
  4. Resolve Variables: Replace @VARIABLE_NAME with actual data
  5. Build Prompt: Combine resolved variables with prompt text
  6. Execute: Send to LLM

Example Advanced Metrics

Example 1: Context-Aware Helpfulness

Name: Context-Aware Helpfulness
Level: Message
Prompt: |
  Evaluate @CURRENT_MESSAGE.output for helpfulness.
  
  Context:
  - User asked: @PREVIOUS_USER_MSG
  - Conversation history: @HISTORY
  
  Consider:
  - Does it address the user's question?
  - Is it accurate?
  - Is it complete?
  
  Provide a score from 0-1.
Output Type: Progress
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG, @HISTORY

Example 2: Tool Chain Evaluation

Name: Tool Chain Correctness
Level: Step
Prompt: |
  Evaluate if @CURRENT_STEP.tool_call is the correct next step.
  
  Context:
  - Previous tool result: @PREVIOUS_STEP.tool_result
  - Available tools: @LIST_AGENT
  - Step reasoning: @CURRENT_STEP.thinking
  
  Answer yes or no.
Output Type: Checkbox
Template Variables: @CURRENT_STEP.tool_call, @PREVIOUS_STEP.tool_result, @LIST_AGENT, @CURRENT_STEP.thinking

Example 3: Multi-Dimensional Analysis

Name: Comprehensive Quality Analysis
Level: Message
Prompt: |
  Analyze @CURRENT_MESSAGE.output across multiple dimensions:
  - Helpfulness: Does it help the user?
  - Accuracy: Is the information correct?
  - Tone: Is the tone appropriate?
  - Completeness: Does it fully address the question?
  
  Context: @PREVIOUS_USER_MSG
Output Type: JSON
Schema: {
  "type": "object",
  "properties": {
    "helpfulness": {"type": "number", "description": "0-1"},
    "accuracy": {"type": "number", "description": "0-1"},
    "tone": {"type": "string", "enum": ["polite", "neutral", "rude"]},
    "completeness": {"type": "number", "description": "0-1"},
    "reasoning": {"type": "string"}
  },
  "required": ["helpfulness", "accuracy", "tone", "completeness", "reasoning"]
}
Template Variables: @CURRENT_MESSAGE.output, @PREVIOUS_USER_MSG

Performance Considerations

Variable Resolution Cost

Some variables require additional processing:
  • @GOAL: Requires goal extraction (cached per conversation)
  • @HISTORY: May require summary creation/update
  • @LIST_AGENT: Requires agent data loading

Optimization Tips

  1. Reuse Variables: Multiple variables in one prompt = one resolution
  2. Cache Goals: Goals are cached per conversation
  3. Reuse Summaries: Summaries are reused across evaluations
  4. Choose Right Level: Step-level is most granular (and most expensive)

Best Practices

Use Template Variables

Always use variables for context-aware evaluation

Be Specific

Specify what to evaluate and how

Provide Context

Include relevant context variables

Test Schemas

Test JSON schemas before running on all data

Next Steps

Creating Metrics

Learn the basics of metric creation

Running Evaluations

Run your advanced metrics