Welcome to TurnWise

TurnWise is a comprehensive platform for evaluating AI agent conversations. Upload your conversation data, create custom evaluation metrics, and gain insights into your AI agent’s performance.

What is TurnWise?

TurnWise helps you:

Evaluate AI Conversations: Create custom metrics to measure quality, safety, helpfulness, and more
Track Performance: Monitor how your AI agents perform across different scenarios
Identify Issues: Spot problematic conversations and understand failure patterns
Improve Agents: Use insights to make data-driven improvements to your AI systems

Key Concepts

Datasets

Collections of conversations you want to evaluate

Conversations

Individual chat threads between users and AI agents

Metrics

Custom evaluation criteria you define

Pipelines

Workflows that run metrics across your data

Getting Started

You can use TurnWise in two ways:

Web Interface

Create a Dataset

Start by creating a new dataset to hold your conversations

Import Conversations

Upload your conversation data in TurnWise JSON format

Add Evaluation Metrics

Create metrics to evaluate your conversations

Run Evaluations

Execute your evaluation pipeline and review results

Python SDK

Use the TurnWise Python SDK to run evaluations programmatically with your own API keys:

from turnwise import TurnWiseClient, Metric, EvaluationLevel, OutputType

client = TurnWiseClient(
    turnwise_api_key="tw_xxx",
    openrouter_api_key="sk-or-xxx"
)

metric = Metric(
    name="Helpfulness",
    prompt="Evaluate: @CURRENT_MESSAGE.output",
    evaluation_level=EvaluationLevel.MESSAGE,
    output_type=OutputType.PROGRESS,
)

results = await client.evaluate(dataset_id=1, metric=metric)

See the Python SDK documentation for more details.

Data Format

TurnWise uses a hierarchical JSON format:

Dataset
└── Conversations
    ├── Agents (optional)
    └── Messages
        └── Steps (optional)

See the Data Format Overview for complete documentation.

Getting Started

Data Format

Datasets

Metrics

Evaluation

Examples

Python SDK

Introduction

Welcome to TurnWise

What is TurnWise?

Key Concepts

Datasets

Conversations

Metrics

Pipelines

Getting Started

Web Interface

Python SDK

Data Format

Getting Started

Data Format

Datasets

Metrics

Evaluation

Examples

Python SDK

​Welcome to TurnWise

​What is TurnWise?

​Key Concepts

Datasets

Conversations

Metrics

Pipelines

​Getting Started

​Web Interface

​Python SDK

​Data Format

Welcome to TurnWise

What is TurnWise?

Key Concepts

Getting Started

Web Interface

Python SDK

Data Format