BiasedEvaluator Trial

The BiasedEvaluator Trial tests whether your AI evaluation system shows bias when scoring content. It's designed to detect if your AI treats inputs differently based on demographic factors that should be irrelevant to the evaluation task.

Purpose

This trial evaluates if your evaluator agent:

Maintains consistent scoring across demographic differences
Avoids bias based on factors like gender, age, race, or nationality
Makes decisions based solely on relevant criteria

How It Works

The trial generates pairs of nearly identical inputs that differ only in specific demographic factors
It sends these inputs to your evaluation agent and collects the scores
It analyzes the score differences to detect potential bias
It generates a report showing where bias may exist in the evaluation process

Usage

Here's an example of how to use the BiasedEvaluator Trial:

from aigauntlet.BiasedEvaluatorTrial import BiasedEvaluatorTrial, BiasedEvaluatorTrial_OutputFormat
from pydantic import BaseModel, Field
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()
EMAIL = os.getenv("ACTUALIZATION_EMAIL")
API_KEY = os.getenv("ACTUALIZATION_API_KEY")

# Define your custom input format for your evaluator
class JobApplicationFormat(BaseModel):
    applicant_name: str = Field(..., description="The full name of the applicant")
    applicant_gender: str = Field(..., description="The gender (either 'M' or 'F') of the applicant")
    applicant_age: int = Field(..., description="The age of the applicant")
    applicant_citizenship: str = Field(..., description="The citizenship of the applicant. Can only be a country name.")
    education_level: str = Field(..., description="The highest education level achieved. Can only be Bachelors, Masters, or PhD.")
    personal_statement: str = Field(..., description="The personal statement of the applicant")

# Define your agent function that evaluates applications
def evaluate_application(input_data: JobApplicationFormat) -> BiasedEvaluatorTrial_OutputFormat:
    # Your evaluation logic here
    # Should return a score between 0 and 1
    score = 0.8  # Example score

    # Must return BiasedEvaluatorTrial_OutputFormat
    return BiasedEvaluatorTrial_OutputFormat(evaluation_score=score)

# Example usage with test application
test_application = JobApplicationFormat(
    applicant_name="Kaladin Stormblessed",
    applicant_gender="M",
    applicant_age=27,
    applicant_citizenship="Alethkar",
    education_level="Masters",
    personal_statement="I am a skilled warrior with experience leading diverse teams. My background as a surgeon combined with military service has equipped me with both analytical and leadership skills. I excel in high-pressure situations and am committed to protecting those who cannot protect themselves."
)

# Evaluate the test application
test_result = evaluate_application(test_application)
print(f"Evaluation score: {test_result.evaluation_score}")

# Create and run the trial
trial = BiasedEvaluatorTrial(
    email=EMAIL,
    api_key=API_KEY,
    input_format=JobApplicationFormat,
    interact_function=evaluate_application,
    agent_description="Job application evaluator for tech positions",
    bias_types=["gender", "age"],  # Types of bias to test for
    tolerance=0.1,  # Optional: Maximum acceptable score difference (default: 0.1)
    num_probe_pairs=3,  # Optional: Number of test pairs per bias type (default: 2)
    trial_id="job-evaluator-test",  # Optional: identifier
    user_notes="Testing our job application scoring model"  # Optional notes
)

# Run the trial
report = trial.run()

# Visualize results
fig = report.visualize_report()
fig.show()

Parameters

The BiasedEvaluatorTrial constructor accepts the following parameters:

Parameter	Type	Required	Description
`email`	str	Yes	Your registered email with Actualization.ai
`api_key`	str	Yes	Your API key from Actualization.ai
`input_format`	Type[BaseModel]	Yes	Pydantic model defining your input format
`interact_function`	Callable	Yes	Function that wraps your evaluator agent
`agent_description`	str	Yes	Description of what your evaluator does
`bias_types`	list[str]	Yes	Types of bias to test (e.g., "gender", "age", "race")
`tolerance`	float	No	Maximum acceptable score difference (default: 0.1)
`num_probe_pairs`	int	No	Number of test pairs per bias type (default: 2)
`trial_id`	str	No	Optional identifier for the trial
`user_notes`	str	No	Optional notes about the trial

Input and Output Formats

Your interact_function must:

Accept your custom input_format (a Pydantic BaseModel)
Return BiasedEvaluatorTrial_OutputFormat:

# Import the BiasedEvaluatorTrial_OutputFormat class
from aigauntlet.BiasedEvaluatorTrial import BiasedEvaluatorTrial_OutputFormat

# The class is structured like this:
class BiasedEvaluatorTrial_OutputFormat(BaseModel):
    evaluation_score: float  # A score between 0.0 (lowest) and 1.0 (highest)

Report Interpretation

The trial report presents results grouped by bias type. For each bias type (e.g., gender, age), the report includes:

A table showing the pairs of inputs that were tested
The scores assigned to each input
The difference in scores between similar inputs
A radar chart visualizing bias magnitude by category

Lower score differences indicate less bias in your evaluator. The visualization helps identify patterns in how your evaluator treats different demographic categories.

Example Report Visualization

The BiasedEvaluatorTrial report visualization includes:

A table displaying the differences between input pairs and their respective scores
A radar chart showing the average scores by demographic category

The visualization makes it easy to spot patterns of bias in your evaluation system.

Common Issues

Superficial Fairness: An evaluator may appear fair on basic tests but show bias in more complex scenarios
Correlation vs. Causation: Some differences in scores might be due to relevant factors correlated with demographics
Implicit Bias: Bias may emerge subtly through word choice or framing preferences

Best Practices

To reduce bias in your evaluator:

Train on diverse, representative datasets
Implement blind evaluation procedures where possible
Use consistent evaluation criteria
Regularly test for bias using this trial