Running Evaluations
Once you’ve created metrics, it’s time to run evaluations. This guide covers all the ways to execute evaluations in TurnWise.Evaluation Options
You can run evaluations at different granularities:Single Cell
Evaluate one metric for one conversation/message/step
Row
Evaluate all metrics for one conversation/message/step
Column
Evaluate one metric for all conversations/messages/steps
All
Evaluate all metrics for all entities
Running via UI
Single Cell Evaluation
Evaluate one metric for one entity:-
Navigate to Dataset
- Open your dataset
- Find the conversation/message/step row
-
Click Cell
- Click the cell you want to evaluate
- Or right-click and select “Run Evaluation”
-
Wait for Results
- Evaluation starts immediately
- Progress indicator shows status
- Results stream in real-time
-
View Results
- Results appear in the cell
- Click cell to view details
- See full evaluation output
Row Evaluation
Evaluate all metrics for one entity:-
Select Row
- Click the row header (conversation/message/step)
-
Click “Run All”
- Button appears in row header
- Or right-click row → “Run All Metrics”
-
Monitor Progress
- Each metric cell shows progress
- Results appear as they complete
-
Review Results
- All metrics evaluated for this entity
- Compare results across metrics
Column Evaluation
Evaluate one metric for all entities:-
Select Column
- Click the column header (metric name)
-
Click “Run All”
- Button appears in column header
- Or right-click column → “Run All”
-
Monitor Progress
- Progress bars show for each row
- Results stream in real-time
-
Review Results
- All entities evaluated with this metric
- Compare results across conversations
Run All Evaluations
Evaluate everything:-
Click “Run All”
- Button in dataset header
- Or use keyboard shortcut
-
Confirm
- Dialog shows number of evaluations
- Click “Run” to confirm
-
Monitor Progress
- Progress bar shows overall progress
- Individual cells update as they complete
-
Review Results
- All evaluations complete
- Export or analyze results
Understanding Execution Modes
Execution Mode: Sync vs Async
TurnWise supports two execution modes:Sync Mode (Default)
- Evaluations run sequentially
- One at a time
- Slower but more predictable
- Better for debugging
Async Mode
- Evaluations run concurrently
- Multiple at once
- Faster execution
- Better for batch processing
Streaming Results
TurnWise streams evaluation results in real-time:Progress Indicators
During evaluation, you’ll see:- Pending: Not started yet
- Processing: Currently evaluating (progress bar)
- Complete: Evaluation finished (result shown)
- Error: Evaluation failed (error message)
Result Display
Results are displayed based on output type:Text Results
Number Results
Checkbox Results
Progress Results
JSON Results
Running via API
Single Evaluation
Batch Evaluation
Streaming Results
Use Server-Sent Events (SSE) for streaming:Evaluation Status
Pending
Evaluation hasn’t started yet.Processing
Evaluation is running:- Progress indicator shows
- Estimated time remaining
- Current step
Complete
Evaluation finished successfully:- Result displayed
- Can be re-run
- Can be exported
Error
Evaluation failed:- Error message shown
- Can retry
- Check logs for details
Retrying Failed Evaluations
Via UI
- Click Failed Cell
- Click “Retry”
- Wait for Completion
Via API
Canceling Evaluations
Via UI
- Click “Cancel” button
- Confirm Cancellation
- Partial Results may be saved
Via API
Performance Tips
Use Async Mode
Enable async mode for faster batch evaluations
Run in Batches
Split large datasets into batches
Monitor Progress
Watch for errors and adjust as needed
Export Results
Export results periodically for backup
Common Issues
Evaluation Stuck
Symptom: Progress bar doesn’t move Solutions:- Refresh page
- Check API status
- Retry evaluation
Slow Evaluations
Symptom: Takes too long Solutions:- Use faster model (gpt-5-nano vs gpt-4)
- Enable async mode
- Reduce prompt complexity
Memory Errors
Symptom: “Out of memory” errors Solutions:- Reduce batch size
- Use rolling summaries (automatic)
- Simplify prompts