Skip to main content

Pipeline Executions

Pipeline executions track when and how evaluations were run. Understanding executions helps you manage evaluation history, debug issues, and analyze performance.

What Are Pipeline Executions?

A pipeline execution represents a single run of an evaluation metric:
  • When: Timestamp of execution
  • What: Which metric was evaluated
  • Where: Which dataset and entities
  • Result: Evaluation results
  • Status: Success or failure

Execution Structure

{
  "id": 123,
  "evaluation_pipeline_id": 5,
  "dataset_id": 1,
  "status": "completed",
  "started_at": "2024-01-20T14:30:00Z",
  "completed_at": "2024-01-20T14:35:00Z",
  "total_evaluations": 100,
  "successful_evaluations": 98,
  "failed_evaluations": 2,
  "meta": {
    "model": "openai/gpt-5-nano",
    "execution_mode": "async"
  }
}

Execution Lifecycle

States

  • Pending: Execution created but not started
  • Processing: Currently running
  • Completed: Finished successfully
  • Failed: Encountered errors
  • Cancelled: User cancelled

Viewing Executions

Via UI

  1. Open Dataset
  2. Click “Executions” Tab
  3. View Execution List
    • See all executions for this dataset
    • Filter by metric, status, date
    • Sort by various columns

Via API

# List executions for dataset
GET /evaluation-pipeline-executions?dataset_id=1

# Get specific execution
GET /evaluation-pipeline-executions/123

# Get execution results
GET /evaluation-pipeline-executions/123/results

Execution Details

Each execution includes:

Basic Info

  • ID: Unique execution identifier
  • Pipeline: Which metric was evaluated
  • Dataset: Which dataset
  • Status: Current status
  • Timestamps: Started, completed times

Statistics

  • Total Evaluations: Number of entities evaluated
  • Successful: Number that succeeded
  • Failed: Number that failed
  • Duration: Total execution time

Results

  • Individual Results: Each conversation/message/step result
  • Aggregated Results: Summary statistics
  • Errors: Any failures with details

Execution History

TurnWise maintains a history of all executions:

Why History Matters

  • Track Changes: See how metrics perform over time
  • Debug Issues: Identify when problems occurred
  • Compare Results: Compare different evaluation runs
  • Audit Trail: Complete record of evaluations

Viewing History

  1. Open Dataset
  2. Click “Executions” Tab
  3. Browse History
    • See all past executions
    • Filter by date range
    • Search by metric name

Re-Running Evaluations

When to Re-Run

  • Metric Updated: Prompt or configuration changed
  • Data Updated: Conversations modified
  • Failed Executions: Retry failed evaluations
  • Model Changed: Using different LLM

How to Re-Run

Via UI

  1. Select Execution
  2. Click “Re-Run”
  3. Confirm
  4. Monitor Progress

Via API

POST /evaluation-pipeline-executions/run
{
  "dataset_id": 1,
  "pipeline_node_id": 5,
  "entity_type": "conversation",
  "entity_id": null
}

Execution Comparison

Compare executions to see:
  • Metric Changes: How results changed
  • Performance: Execution time differences
  • Accuracy: Success rate changes

Comparing Results

  1. Select Two Executions
  2. Click “Compare”
  3. View Differences
    • Side-by-side comparison
    • Highlighted changes
    • Statistical analysis

Execution Metadata

Executions store metadata:
{
  "meta": {
    "model": "openai/gpt-5-nano",
    "execution_mode": "async",
    "batch_size": 10,
    "template_variables": ["@HISTORY", "@GOAL"],
    "user_id": "user_123"
  }
}
Useful for:
  • Debugging: Understand execution context
  • Analysis: Filter by execution parameters
  • Auditing: Track who ran what

Execution Results

Individual Results

Each evaluation produces a result:
{
  "conversation_id": 123,
  "pipeline_node_id": 5,
  "result": {
    "score": 0.85,
    "reasoning": "Response is helpful"
  },
  "status": "completed",
  "execution_time": 2.5,
  "created_at": "2024-01-20T14:30:00Z"
}

Aggregated Results

Summary statistics:
{
  "total": 100,
  "average_score": 0.82,
  "min_score": 0.3,
  "max_score": 1.0,
  "std_dev": 0.15,
  "distribution": {
    "0-0.5": 5,
    "0.5-0.7": 20,
    "0.7-0.9": 50,
    "0.9-1.0": 25
  }
}

Exporting Results

Via UI

  1. Select Execution
  2. Click “Export”
  3. Choose Format
    • CSV
    • JSON
    • Excel

Via API

GET /evaluation-pipeline-executions/123/export?format=csv

Execution Performance

Monitoring Performance

Track execution metrics:
  • Duration: How long it took
  • Throughput: Evaluations per second
  • Success Rate: Percentage successful
  • Cost: LLM API costs

Optimizing Performance

Use Async Mode

Enable concurrent execution

Batch Size

Optimize batch sizes

Model Selection

Use faster models when possible

Monitor Resources

Track performance metrics

Troubleshooting Executions

Failed Executions

Check:
  • Error messages in execution details
  • LLM API status
  • Data validity
  • Prompt correctness
Solutions:
  • Retry execution
  • Fix data issues
  • Update prompt
  • Check API keys

Slow Executions

Check:
  • Execution mode (sync vs async)
  • Model selection
  • Batch size
  • Network latency
Solutions:
  • Enable async mode
  • Use faster model
  • Increase batch size
  • Check network

Best Practices

Review History

Regularly review execution history

Monitor Performance

Track execution metrics

Export Results

Export important results

Document Changes

Note why executions were re-run

Next Steps