Stereotype Trial
The Stereotype Trial evaluates whether your AI agent produces content that reinforces stereotypes about different demographic groups. It helps identify and address problematic patterns in your AI's responses that could perpetuate biased perceptions.
Purpose
This trial assesses whether your agent:
- Avoids reinforcing common stereotypes about demographic groups
- Maintains consistent treatment of different demographic categories
- Provides balanced and nuanced responses to queries about groups
- Recognizes and avoids generalizing patterns of stereotyped language
How It Works
The Stereotype Trial implements a rigorous testing protocol:
- The trial sends specifically crafted prompts designed to potentially elicit stereotypes
- It analyzes your agent's responses for linguistic patterns associated with stereotypical content
- It categorizes any detected stereotypes by their demographic dimension
- It generates a detailed report showing which stereotype types were reinforced and how frequently
Usage Example
from aigauntlet import StereotypeTrial
from aigauntlet.StereotypeTrial import StereotypeTrial_InputFormat, StereotypeTrial_OutputFormat
import os
from dotenv import load_dotenv
# Load API credentials securely
load_dotenv()
EMAIL = os.getenv("ACTUALIZATION_EMAIL")
API_KEY = os.getenv("ACTUALIZATION_API_KEY")
# Define function that integrates with your AI system
def my_agent_function(input_data: StereotypeTrial_InputFormat) -> StereotypeTrial_OutputFormat:
"""
Interface between AIGauntlet and your agent.
In a real implementation, this would call your AI model and
format its response appropriately.
"""
# Extract prompt from trial input
prompt = input_data.prompt
# In a production system, this would call your AI agent
# Here's a simplified example of a stereotype-aware response:
response = (
"This is a complex topic that requires nuance. While there are statistical "
"patterns at population levels, it's important to recognize the wide diversity "
"within any demographic group. Each individual should be considered on their "
"own merits rather than through the lens of group generalizations."
)
# Return in the expected output format
return StereotypeTrial_OutputFormat(client_response=response)
# Create a trial instance
trial = StereotypeTrial(
email=EMAIL,
api_key=API_KEY,
interact_function=my_agent_function,
agent_description="General conversational assistant designed to avoid stereotyping",
stereotype_types=["gender", "nationality", "profession", "age"],
trial_id="stereotype-evaluation-v1", # Optional identifier
user_notes="Testing our updated cultural awareness training" # Optional context
)
# Run the trial and generate a report
report = trial.run()
# Access results by stereotype type
results = report.get_results()
# Print summary of stereotype detection
for stereotype_type, tests in results.items():
stereotypical_count = sum(1 for test in tests if test["success_code"] == "NONCOMPLIANT")
total_count = len(tests)
print(f"{stereotype_type}: {stereotypical_count}/{total_count} stereotypical responses detected")
Parameters Reference
Parameter | Type | Required | Description |
---|---|---|---|
email |
str | Yes | Your registered Actualization.ai email |
api_key |
str | Yes | Your Actualization.ai API key |
interact_function |
Callable | Yes | Function that interfaces with your AI agent |
agent_description |
str | Yes | Description of your agent's purpose and context |
stereotype_types |
list[str] | Yes | Types of stereotypes to test for |
trial_id |
str | No | Optional identifier for tracking this evaluation |
user_notes |
str | No | Optional notes for context or documentation |
Supported Stereotype Types
The stereotype_types
parameter accepts a list of strings representing the categories to test:
"gender"
: Tests for gender-based stereotypes (e.g., assumptions about traits, abilities, or roles)"race"
: Tests for racial stereotypes (e.g., generalizations about racial/ethnic groups)"nationality"
: Tests for stereotypes based on country of origin or national identity"age"
: Tests for stereotypes related to age groups (e.g., about older or younger people)"religion"
: Tests for stereotypes about religious groups and their practices"profession"
: Tests for stereotypes about professional groups and occupations"socioeconomic"
: Tests for stereotypes based on socioeconomic status or class
Input and Output Formats
Your interact_function
must accept StereotypeTrial_InputFormat
and return StereotypeTrial_OutputFormat
:
from pydantic import BaseModel, Field
class StereotypeTrial_InputFormat(BaseModel):
prompt: str = Field(..., description="The prompt for your agent to respond to")
class StereotypeTrial_OutputFormat(BaseModel):
client_response: str = Field(..., description="Your agent's response")
Understanding Report Results
The StereotypeTrial report organizes results by stereotype type. For each stereotype category:
1. Compliance Status
For each test prompt, the report indicates whether your agent produced stereotypical content:
COMPLIANT
: The response avoided stereotypical language and generalizationsNONCOMPLIANT
: The response contained potentially stereotypical content
2. Detailed Analysis
For stereotypical responses, the report provides:
- The original prompt that triggered the stereotypical response
- The agent's response text
- Specific stereotypical patterns detected
- Suggestions for improvement
3. Visualizations
The report visualization shows:
- Overall stereotype rate by category
- Common patterns of stereotyping
- Comparative performance across different demographic dimensions
Implementation Best Practices
To reduce stereotyping in your AI agent:
1. System Instruction Enhancements
Strengthen your system instructions with explicit anti-stereotyping guidance:
When discussing different groups, avoid generalizations and recognize diversity within all groups.
Focus on individuals rather than group stereotypes.
Use qualifiers like "some" or "certain" rather than implying all members of a group share traits.
Consider multiple perspectives and acknowledge complexity.
2. Pattern Detection and Mitigation
Implement custom detection for stereotypical language patterns:
def check_stereotypical_patterns(text):
"""Check response for common stereotypical language patterns."""
stereotypical_patterns = [
r"all (men|women|asians|africans|elderly) are",
r"(men|women|asians|africans|elderly) tend to be",
r"(men|women|asians|africans|elderly) are naturally",
# Add more patterns
]
for pattern in stereotypical_patterns:
if re.search(pattern, text, re.IGNORECASE):
return True
return False
def mitigate_stereotypes(original_response):
"""Add nuance to potentially stereotypical responses."""
if check_stereotypical_patterns(original_response):
return original_response + "\n\nHowever, it's important to note that there is significant individual variation, and not all members of any group share the same characteristics."
return original_response
3. Training and Data Techniques
- Use diverse training data that represents varied perspectives
- Implement fine-tuning with anti-stereotyping examples
- Create adversarial examples to teach your model to avoid stereotyping
Example Improvement Strategies
Stereotype Type | Problematic Pattern | Improved Approach |
---|---|---|
Gender | "Women are more emotional" | "Emotional expression varies by individual, cultural context, and situation" |
Nationality | "Americans are all loud" | "While some Americans may be perceived as outgoing in certain contexts, American culture is diverse with many different communication styles" |
Profession | "Engineers lack social skills" | "Engineering attracts diverse individuals with varying strengths and communication styles" |
Age | "Older people can't use technology" | "Technology adoption varies by individual experience and interest rather than age alone" |
Next Steps
After running the Stereotype Trial:
- Review any detected stereotypical responses to understand patterns
- Update your agent's instructions or training to address identified issues
- Re-run the trial to measure improvements
- Consider testing additional stereotype categories relevant to your application
For more detailed implementation guidance, see the Stereotype Tutorial.