Stereotype Trial

The Stereotype Trial evaluates whether your AI agent produces content that reinforces stereotypes about different demographic groups. It helps identify and address problematic patterns in your AI's responses that could perpetuate biased perceptions.

Purpose

This trial assesses whether your agent:

Avoids reinforcing common stereotypes about demographic groups
Maintains consistent treatment of different demographic categories
Provides balanced and nuanced responses to queries about groups
Recognizes and avoids generalizing patterns of stereotyped language

How It Works

The Stereotype Trial implements a rigorous testing protocol:

The trial sends specifically crafted prompts designed to potentially elicit stereotypes
It analyzes your agent's responses for linguistic patterns associated with stereotypical content
It categorizes any detected stereotypes by their demographic dimension
It generates a detailed report showing which stereotype types were reinforced and how frequently

Usage Example

from aigauntlet import StereotypeTrial
from aigauntlet.StereotypeTrial import StereotypeTrial_InputFormat, StereotypeTrial_OutputFormat
import os
from dotenv import load_dotenv

# Load API credentials securely
load_dotenv()
EMAIL = os.getenv("ACTUALIZATION_EMAIL")
API_KEY = os.getenv("ACTUALIZATION_API_KEY")

# Define function that integrates with your AI system
def my_agent_function(input_data: StereotypeTrial_InputFormat) -> StereotypeTrial_OutputFormat:
    """
    Interface between AIGauntlet and your agent.

    In a real implementation, this would call your AI model and
    format its response appropriately.
    """
    # Extract prompt from trial input
    prompt = input_data.prompt

    # In a production system, this would call your AI agent
    # Here's a simplified example of a stereotype-aware response:
    response = (
        "This is a complex topic that requires nuance. While there are statistical "
        "patterns at population levels, it's important to recognize the wide diversity "
        "within any demographic group. Each individual should be considered on their "
        "own merits rather than through the lens of group generalizations."
    )

    # Return in the expected output format
    return StereotypeTrial_OutputFormat(client_response=response)

# Create a trial instance
trial = StereotypeTrial(
    email=EMAIL,
    api_key=API_KEY,
    interact_function=my_agent_function,
    agent_description="General conversational assistant designed to avoid stereotyping",
    stereotype_types=["gender", "nationality", "profession", "age"],
    trial_id="stereotype-evaluation-v1",  # Optional identifier
    user_notes="Testing our updated cultural awareness training"  # Optional context
)

# Run the trial and generate a report
report = trial.run()

# Access results by stereotype type
results = report.get_results()

# Print summary of stereotype detection
for stereotype_type, tests in results.items():
    stereotypical_count = sum(1 for test in tests if test["success_code"] == "NONCOMPLIANT")
    total_count = len(tests)
    print(f"{stereotype_type}: {stereotypical_count}/{total_count} stereotypical responses detected")

Parameters Reference

Parameter	Type	Required	Description
`email`	str	Yes	Your registered Actualization.ai email
`api_key`	str	Yes	Your Actualization.ai API key
`interact_function`	Callable	Yes	Function that interfaces with your AI agent
`agent_description`	str	Yes	Description of your agent's purpose and context
`stereotype_types`	list[str]	Yes	Types of stereotypes to test for
`trial_id`	str	No	Optional identifier for tracking this evaluation
`user_notes`	str	No	Optional notes for context or documentation

Supported Stereotype Types

The stereotype_types parameter accepts a list of strings representing the categories to test:

"gender": Tests for gender-based stereotypes (e.g., assumptions about traits, abilities, or roles)
"race": Tests for racial stereotypes (e.g., generalizations about racial/ethnic groups)
"nationality": Tests for stereotypes based on country of origin or national identity
"age": Tests for stereotypes related to age groups (e.g., about older or younger people)
"religion": Tests for stereotypes about religious groups and their practices
"profession": Tests for stereotypes about professional groups and occupations
"socioeconomic": Tests for stereotypes based on socioeconomic status or class

Input and Output Formats

Your interact_function must accept StereotypeTrial_InputFormat and return StereotypeTrial_OutputFormat:

from pydantic import BaseModel, Field

class StereotypeTrial_InputFormat(BaseModel):
    prompt: str = Field(..., description="The prompt for your agent to respond to")

class StereotypeTrial_OutputFormat(BaseModel):
    client_response: str = Field(..., description="Your agent's response")

Understanding Report Results

The StereotypeTrial report organizes results by stereotype type. For each stereotype category:

1. Compliance Status

For each test prompt, the report indicates whether your agent produced stereotypical content:

COMPLIANT: The response avoided stereotypical language and generalizations
NONCOMPLIANT: The response contained potentially stereotypical content

2. Detailed Analysis

For stereotypical responses, the report provides:

The original prompt that triggered the stereotypical response
The agent's response text
Specific stereotypical patterns detected
Suggestions for improvement

3. Visualizations

The report visualization shows:

Overall stereotype rate by category
Common patterns of stereotyping
Comparative performance across different demographic dimensions

Implementation Best Practices

To reduce stereotyping in your AI agent:

1. System Instruction Enhancements

Strengthen your system instructions with explicit anti-stereotyping guidance:

When discussing different groups, avoid generalizations and recognize diversity within all groups.
Focus on individuals rather than group stereotypes.
Use qualifiers like "some" or "certain" rather than implying all members of a group share traits.
Consider multiple perspectives and acknowledge complexity.

2. Pattern Detection and Mitigation

Implement custom detection for stereotypical language patterns:

def check_stereotypical_patterns(text):
    """Check response for common stereotypical language patterns."""
    stereotypical_patterns = [
        r"all (men|women|asians|africans|elderly) are",
        r"(men|women|asians|africans|elderly) tend to be",
        r"(men|women|asians|africans|elderly) are naturally",
        # Add more patterns
    ]

    for pattern in stereotypical_patterns:
        if re.search(pattern, text, re.IGNORECASE):
            return True
    return False

def mitigate_stereotypes(original_response):
    """Add nuance to potentially stereotypical responses."""
    if check_stereotypical_patterns(original_response):
        return original_response + "\n\nHowever, it's important to note that there is significant individual variation, and not all members of any group share the same characteristics."
    return original_response

3. Training and Data Techniques

Use diverse training data that represents varied perspectives
Implement fine-tuning with anti-stereotyping examples
Create adversarial examples to teach your model to avoid stereotyping

Example Improvement Strategies

Stereotype Type	Problematic Pattern	Improved Approach
Gender	"Women are more emotional"	"Emotional expression varies by individual, cultural context, and situation"
Nationality	"Americans are all loud"	"While some Americans may be perceived as outgoing in certain contexts, American culture is diverse with many different communication styles"
Profession	"Engineers lack social skills"	"Engineering attracts diverse individuals with varying strengths and communication styles"
Age	"Older people can't use technology"	"Technology adoption varies by individual experience and interest rather than age alone"

Next Steps

After running the Stereotype Trial:

Review any detected stereotypical responses to understand patterns
Update your agent's instructions or training to address identified issues
Re-run the trial to measure improvements
Consider testing additional stereotype categories relevant to your application

For more detailed implementation guidance, see the Stereotype Tutorial.