BiasedEvaluator Trial
The BiasedEvaluator Trial tests whether your AI evaluation system shows bias when scoring content. It's designed to detect if your AI treats inputs differently based on demographic factors that should be irrelevant to the evaluation task.
Purpose
This trial evaluates if your evaluator agent:
- Maintains consistent scoring across demographic differences
- Avoids bias based on factors like gender, age, race, or nationality
- Makes decisions based solely on relevant criteria
How It Works
- The trial generates pairs of nearly identical inputs that differ only in specific demographic factors
- It sends these inputs to your evaluation agent and collects the scores
- It analyzes the score differences to detect potential bias
- It generates a report showing where bias may exist in the evaluation process
Usage
Here's an example of how to use the BiasedEvaluator Trial:
from aigauntlet.BiasedEvaluatorTrial import BiasedEvaluatorTrial, BiasedEvaluatorTrial_OutputFormat
from pydantic import BaseModel, Field
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
EMAIL = os.getenv("ACTUALIZATION_EMAIL")
API_KEY = os.getenv("ACTUALIZATION_API_KEY")
# Define your custom input format for your evaluator
class JobApplicationFormat(BaseModel):
applicant_name: str = Field(..., description="The full name of the applicant")
applicant_gender: str = Field(..., description="The gender (either 'M' or 'F') of the applicant")
applicant_age: int = Field(..., description="The age of the applicant")
applicant_citizenship: str = Field(..., description="The citizenship of the applicant. Can only be a country name.")
education_level: str = Field(..., description="The highest education level achieved. Can only be Bachelors, Masters, or PhD.")
personal_statement: str = Field(..., description="The personal statement of the applicant")
# Define your agent function that evaluates applications
def evaluate_application(input_data: JobApplicationFormat) -> BiasedEvaluatorTrial_OutputFormat:
# Your evaluation logic here
# Should return a score between 0 and 1
score = 0.8 # Example score
# Must return BiasedEvaluatorTrial_OutputFormat
return BiasedEvaluatorTrial_OutputFormat(evaluation_score=score)
# Example usage with test application
test_application = JobApplicationFormat(
applicant_name="Kaladin Stormblessed",
applicant_gender="M",
applicant_age=27,
applicant_citizenship="Alethkar",
education_level="Masters",
personal_statement="I am a skilled warrior with experience leading diverse teams. My background as a surgeon combined with military service has equipped me with both analytical and leadership skills. I excel in high-pressure situations and am committed to protecting those who cannot protect themselves."
)
# Evaluate the test application
test_result = evaluate_application(test_application)
print(f"Evaluation score: {test_result.evaluation_score}")
# Create and run the trial
trial = BiasedEvaluatorTrial(
email=EMAIL,
api_key=API_KEY,
input_format=JobApplicationFormat,
interact_function=evaluate_application,
agent_description="Job application evaluator for tech positions",
bias_types=["gender", "age"], # Types of bias to test for
tolerance=0.1, # Optional: Maximum acceptable score difference (default: 0.1)
num_probe_pairs=3, # Optional: Number of test pairs per bias type (default: 2)
trial_id="job-evaluator-test", # Optional: identifier
user_notes="Testing our job application scoring model" # Optional notes
)
# Run the trial
report = trial.run()
# Visualize results
fig = report.visualize_report()
fig.show()
Parameters
The BiasedEvaluatorTrial
constructor accepts the following parameters:
Parameter | Type | Required | Description |
---|---|---|---|
email |
str | Yes | Your registered email with Actualization.ai |
api_key |
str | Yes | Your API key from Actualization.ai |
input_format |
Type[BaseModel] | Yes | Pydantic model defining your input format |
interact_function |
Callable | Yes | Function that wraps your evaluator agent |
agent_description |
str | Yes | Description of what your evaluator does |
bias_types |
list[str] | Yes | Types of bias to test (e.g., "gender", "age", "race") |
tolerance |
float | No | Maximum acceptable score difference (default: 0.1) |
num_probe_pairs |
int | No | Number of test pairs per bias type (default: 2) |
trial_id |
str | No | Optional identifier for the trial |
user_notes |
str | No | Optional notes about the trial |
Input and Output Formats
Your interact_function
must:
- Accept your custom
input_format
(a Pydantic BaseModel) - Return
BiasedEvaluatorTrial_OutputFormat
:
# Import the BiasedEvaluatorTrial_OutputFormat class
from aigauntlet.BiasedEvaluatorTrial import BiasedEvaluatorTrial_OutputFormat
# The class is structured like this:
class BiasedEvaluatorTrial_OutputFormat(BaseModel):
evaluation_score: float # A score between 0.0 (lowest) and 1.0 (highest)
Report Interpretation
The trial report presents results grouped by bias type. For each bias type (e.g., gender, age), the report includes:
- A table showing the pairs of inputs that were tested
- The scores assigned to each input
- The difference in scores between similar inputs
- A radar chart visualizing bias magnitude by category
Lower score differences indicate less bias in your evaluator. The visualization helps identify patterns in how your evaluator treats different demographic categories.
Example Report Visualization
The BiasedEvaluatorTrial report visualization includes:
- A table displaying the differences between input pairs and their respective scores
- A radar chart showing the average scores by demographic category
The visualization makes it easy to spot patterns of bias in your evaluation system.
Common Issues
- Superficial Fairness: An evaluator may appear fair on basic tests but show bias in more complex scenarios
- Correlation vs. Causation: Some differences in scores might be due to relevant factors correlated with demographics
- Implicit Bias: Bias may emerge subtly through word choice or framing preferences
Best Practices
To reduce bias in your evaluator:
- Train on diverse, representative datasets
- Implement blind evaluation procedures where possible
- Use consistent evaluation criteria
- Regularly test for bias using this trial