Skip to main content

Dataset Setup Guide

This guide walks you through creating datasets and importing your conversation data into TurnWise.

Creating a Dataset

Via UI

  1. Navigate to Datasets
    • Click “Datasets” in the sidebar
    • Or go to /datasets in your browser
  2. Create New Dataset
    • Click “New Dataset” button in the sidebar
    • Fill in the form:
      • Name: A descriptive name (e.g., “Customer Support Q1 2024”)
      • Description: Optional details about the dataset
  3. Save
    • Click “Create” to save
    • You’ll be redirected to the dataset view

Via API

curl -X POST "http://localhost:8000/datasets" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Dataset",
    "description": "Dataset description"
  }'

Importing Conversations

Step 1: Prepare Your JSON File

Your file should follow the TurnWise data format:
{
  "conversations": [
    {
      "name": "Conversation 1",
      "messages": [
        {
          "role": "user",
          "content": "Hello"
        },
        {
          "role": "assistant",
          "content": "Hi there!"
        }
      ]
    }
  ]
}
See the Data Format Overview for complete schema details.

Step 2: Import via UI

  1. Open Your Dataset
    • Click on the dataset name
    • You’ll see the dataset view
  2. Click Import
    • Click the “Import” button in the header
    • A file upload dialog appears
  3. Select File
    • Drag and drop your JSON file
    • Or click to browse and select
  4. Import
    • Click “Import” to start
    • TurnWise validates your file

Step 3: Validation

TurnWise validates your file structure:
1

JSON Parsing

Checks for valid JSON syntax
2

Structure Validation

Verifies required fields (conversations, messages, role)
3

Data Validation

Validates field types and values

Step 4: Handle Validation Errors

If validation fails, TurnWise provides detailed error messages:
{
  "success": false,
  "errors": [
    {
      "path": "conversations[0].messages[1]",
      "message": "Missing required field 'role'",
      "suggestion": "Add 'role' field with value 'user', 'assistant', 'system', or 'tool'"
    }
  ]
}

LLM-Powered Suggestions

If get_feedback=true (default), TurnWise uses AI to analyze your format and suggest transformations:
{
  "errors": [...],
  "llm_feedback": {
    "analysis": "Your data appears to be in OpenAI format. Here's how to transform it...",
    "suggested_transformation": {
      "from": "messages[].role",
      "to": "messages[].role",
      "mapping": "..."
    }
  }
}
Enable LLM feedback - It helps identify format mismatches and suggests fixes automatically.

Step 5: Import Success

On successful import, you’ll see:
  • Conversations imported: Number of conversations added
  • Messages imported: Total messages imported
  • Steps imported: Total steps imported (if any)
  • Errors: Any warnings or errors

Import Options

Import to Existing Dataset

Use this when you want to add more conversations to an existing dataset:
POST /datasets/{dataset_id}/import

Import with New Dataset

Create a dataset and import in one step:
POST /datasets/import
Parameters:
  • file: JSON file
  • user_id: Your user ID
  • name: Dataset name
  • description: Optional description
  • get_feedback: Enable LLM feedback (default: true)

Common Import Issues

Issue: Invalid JSON

Error: Invalid JSON: Unexpected token Solution:
  • Check for syntax errors (missing commas, quotes, brackets)
  • Validate JSON with a JSON validator
  • Ensure file is UTF-8 encoded

Issue: Missing Required Fields

Error: Missing required field 'conversations' Solution:
  • Ensure root object has conversations array
  • Each conversation must have messages array
  • Each message must have role

Issue: Invalid Role Values

Error: Invalid role value: 'bot' Solution:
  • Use only: user, assistant, system, tool
  • Map your roles to TurnWise roles:
    • botassistant
    • humanuser
    • systemsystem

Issue: Message Order

Note: Message order is automatically inferred from the array position. Keep messages in chronological order in the array.

Issue: Large File Size

Error: File too large or timeout Solution:
  • Split large datasets into multiple files
  • Import in batches
  • Consider compressing JSON (though TurnWise doesn’t support gzip yet)

Import Best Practices

Validate Locally First

Test your JSON structure before importing

Start Small

Import a few conversations first to verify format

Use Descriptive Names

Name conversations clearly for easier identification

Include Metadata

Use meta fields to store custom data

Data Quality Tips

1. Consistent Formatting

Keep your data format consistent:
  • Same field names across conversations
  • Consistent role values
  • Sequential sequence numbers

2. Complete Conversations

Include full conversations:
  • Don’t truncate mid-conversation
  • Include all messages
  • Preserve message order

3. Rich Context

Include as much context as possible:
  • Message content
  • Steps with thinking/reasoning
  • Tool calls and results
  • Agent definitions

4. Metadata

Use meta fields for additional context:
  • Timestamps
  • User IDs
  • Session IDs
  • Custom tags

After Import

Once imported, you can:
  1. View Conversations: Browse your conversations in the hierarchical table
  2. Create Metrics: Add evaluation metrics to measure quality
  3. Run Evaluations: Evaluate conversations, messages, or steps
  4. Export Results: Export evaluation results for analysis

Next Steps