Skip to main content

API Reference

TurnWiseClient

Main client for interacting with TurnWise.

Constructor

TurnWiseClient(
    turnwise_api_key: str,
    openrouter_api_key: str,
    turnwise_base_url: Optional[str] = None,
    default_model: str = "openai/gpt-4o-mini",
)
Parameters:
  • turnwise_api_key (str): Your TurnWise API key
  • openrouter_api_key (str): Your OpenRouter API key
  • turnwise_base_url (Optional[str]): Custom base URL (defaults to production)
  • default_model (str): Default model for evaluations

Methods

verify()

Verify API key and connection.
result = await client.verify()
# Returns: AuthVerifyResponse(valid=True, user_id="...")

list_datasets()

List all datasets for the authenticated user.
datasets = await client.list_datasets()
# Returns: List[Dataset]

get_conversations(dataset_id)

Get all conversations for a dataset.
conversations = await client.get_conversations(dataset_id=1)
# Returns: List[Conversation]

get_pipelines(dataset_id)

Get all evaluation pipelines for a dataset.
pipelines = await client.get_pipelines(dataset_id=1)
# Returns: List[Pipeline]

register_metric(dataset_id, metric)

Register a metric in TurnWise.
result = await client.register_metric(
    dataset_id=1,
    metric=metric
)
# Returns: MetricCreateResponse(pipeline_id=1, node_id=1, name="...")

evaluate(dataset_id, metric, …)

Run evaluation on a dataset.
results = await client.evaluate(
    dataset_id: int,
    metric: Metric,
    max_concurrent: int = 3,
    auto_sync: bool = True,
    progress_callback: Optional[Callable] = None,
)
Parameters:
  • dataset_id (int): Dataset ID to evaluate
  • metric (Metric): Metric definition
  • max_concurrent (int): Maximum concurrent evaluations
  • auto_sync (bool): Automatically sync results to TurnWise
  • progress_callback (Optional[Callable]): Progress callback function
Returns: List[EvaluationResult]

sync_results(results)

Manually sync evaluation results to TurnWise.
result = await client.sync_results(results)
# Returns: SyncResponse(synced_count=10, execution_id=1)

close()

Close the client and cleanup resources.
await client.close()

Metric

Definition of an evaluation metric.

Constructor

Metric(
    name: str,
    prompt: str,
    evaluation_level: EvaluationLevel,
    output_type: OutputType,
    description: Optional[str] = None,
    output_schema: Optional[Dict] = None,
    model_name: Optional[str] = None,
    aggregate_results: bool = False,
    node_id: Optional[int] = None,
    pipeline_id: Optional[int] = None,
)
Parameters:
  • name (str): Metric name
  • prompt (str): Evaluation prompt (supports @VARIABLE syntax)
  • evaluation_level (EvaluationLevel): Level to evaluate at
  • output_type (OutputType): Type of output
  • description (Optional[str]): Metric description
  • output_schema (Optional[Dict]): JSON schema for JSON output type
  • model_name (Optional[str]): Model to use (overrides default)
  • aggregate_results (bool): Aggregate step results to message level
  • node_id (Optional[int]): Existing node ID (for reusing metrics)
  • pipeline_id (Optional[int]): Pipeline ID (for reusing metrics)

EvaluationLevel

Enumeration of evaluation levels.
class EvaluationLevel(str, Enum):
    CONVERSATION = "conversation"
    MESSAGE = "message"
    STEP = "step"

OutputType

Enumeration of output types.
class OutputType(str, Enum):
    TEXT = "text"
    NUMBER = "number"
    CHECKBOX = "checkbox"
    PROGRESS = "progress"
    JSON = "json"

EvaluationResult

Result of an evaluation.

Properties

  • entity_id (int): ID of evaluated entity
  • entity_type (str): Type of entity (“conversation”, “message”, “step”)
  • result (Any): Raw evaluation result
  • metadata (Dict): Additional metadata

Methods

get_score()

Extract score from result.
score = result.get_score()
# Returns: Optional[float] (for PROGRESS output type)

EvaluationOrchestrator

Lower-level orchestrator for manual evaluation control.

Constructor

EvaluationOrchestrator(
    llm_provider: LLMProvider,
    default_model: str,
    extract_goals: bool = True,
)
Parameters:
  • llm_provider (LLMProvider): LLM provider instance
  • default_model (str): Default model name
  • extract_goals (bool): Whether to extract goals from conversations

Methods

evaluate_conversation(conversation, metric)

Evaluate a single conversation.
results = await orchestrator.evaluate_conversation(
    conversation=conversation,
    metric=metric,
)
# Returns: List[EvaluationResult]

OpenRouterProvider

LLM provider for OpenRouter API.

Constructor

OpenRouterProvider(
    api_key: str,
    base_url: str = "https://openrouter.ai/api/v1",
)

Models

Dataset

class Dataset(BaseModel):
    id: int
    name: str
    description: Optional[str]
    conversation_count: int

Conversation

class Conversation(BaseModel):
    id: int
    name: Optional[str]
    messages: List[Message]
    agents: Optional[List[Agent]]

Message

class Message(BaseModel):
    id: int
    role: str
    output: Optional[str]
    steps: Optional[List[Step]]

Step

class Step(BaseModel):
    id: int
    thinking: Optional[str]
    tool_call: Optional[Dict]
    tool_result: Optional[Dict]
    output_content: Optional[str]
    output_structured: Optional[Dict]

Utility Functions

setup_logging()

Configure logging for the SDK.
from turnwise import setup_logging

setup_logging(level="INFO")

Error Handling

TurnWiseAPIError

Exception raised for TurnWise API errors.
try:
    await client.verify()
except TurnWiseAPIError as e:
    print(f"API Error: {e.status_code} - {e.message}")

Next Steps