Testing for Stereotypes with the Stereotype Trial
This tutorial walks you through using the Stereotype Trial to test whether your AI agent produces stereotypical responses based on demographic characteristics.
Prerequisites
Before starting, make sure you have:
- Installed AIGauntlet (
pip install aigauntlet
) - An API key from Actualization.ai
- An AI agent that can respond to conversational prompts
Step 1: Set Up Your Environment
First, let's set up your environment and import the necessary dependencies:
import os
import openai
from dotenv import load_dotenv
from aigauntlet import StereotypeTrial
from aigauntlet.StereotypeTrial import StereotypeTrial_InputFormat, StereotypeTrial_OutputFormat
# Load environment variables from .env file
load_dotenv()
# Configure your API keys
ACTUALIZATION_API_KEY = os.getenv("ACTUALIZATION_API_KEY")
ACTUALIZATION_EMAIL = os.getenv("ACTUALIZATION_EMAIL")
openai.api_key = os.getenv("OPENAI_API_KEY")
# Verify credentials are loaded
if not all([ACTUALIZATION_API_KEY, ACTUALIZATION_EMAIL, openai.api_key]):
raise ValueError("Missing required API credentials. Check your .env file.")
Step 2: Create an Agent Function
Create a function that wraps your AI agent. This function must:
- Accept a
StereotypeTrial_InputFormat
object - Return a
StereotypeTrial_OutputFormat
object
Here's an example using OpenAI's API:
def my_agent_function(input_data: StereotypeTrial_InputFormat) -> StereotypeTrial_OutputFormat:
"""Handle a prompt and return a response using OpenAI's API."""
# Extract the prompt from the input format
prompt = input_data.prompt
# Define a system message for your agent
system_message = """
You are a helpful, respectful, and balanced assistant.
When discussing different demographic groups, be careful to:
- Avoid making generalizations about groups
- Recognize diversity within all groups
- Base your responses on factual information, not stereotypes
- Present balanced viewpoints
- Acknowledge societal context when relevant
"""
# Call the OpenAI API
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=500
)
# Extract the response text
response_text = response.choices[0].message.content
# Return in the expected output format
return StereotypeTrial_OutputFormat(client_response=response_text)
Step 3: Create and Run the Trial
Now let's create and run the trial with your agent function:
# Create the trial
trial = StereotypeTrial(
email=ACTUALIZATION_EMAIL,
api_key=ACTUALIZATION_API_KEY,
interact_function=my_agent_function,
agent_description="General purpose conversational AI assistant",
stereotype_types=["gender", "race", "nationality", "profession"],
trial_id="stereotype-tutorial", # Optional identifier
user_notes="Testing our agent for stereotypical responses" # Optional notes
)
# Run the trial
print("Running Stereotype trial...")
report = trial.run()
print("Trial completed!")
Step 4: Analyze the Results
After the trial completes, you can analyze the results:
# Get the results grouped by stereotype type
results = report.get_results()
# Print summary statistics for each stereotype type
for stereotype_type, tests in results.items():
print(f"\n--- {stereotype_type.capitalize()} Stereotype Analysis ---")
# Count stereotypical responses
stereotypical = sum(1 for test in tests if test["success_code"] == "NONCOMPLIANT")
total_tests = len(tests)
stereotype_rate = stereotypical / total_tests if total_tests > 0 else 0
print(f"Tests detecting stereotypes: {stereotypical}/{total_tests} ({stereotype_rate*100:.1f}%)")
Step 5: Review Problematic Responses
To improve your agent, it's helpful to review specific cases where stereotypical responses were detected:
# Find and print examples of stereotypical responses
for stereotype_type, tests in results.items():
stereotypical_tests = [test for test in tests if test["success_code"] == "NONCOMPLIANT"]
if not stereotypical_tests:
continue
print(f"\n=== Stereotypical {stereotype_type.capitalize()} Responses ===")
for i, test in enumerate(stereotypical_tests[:3]): # Show up to 3 examples per type
print(f"\nExample {i+1}:")
print(f"Prompt: {test.get('extra', {}).get('prompt', 'No prompt available')}")
print(f"\nResponse: {test.get('extra', {}).get('response', 'No response available')}\n")
# If available, show what specific stereotype was detected
stereotype_details = test.get('extra', {}).get('stereotype_details')
if stereotype_details:
print(f"Detected stereotype: {stereotype_details}")
print("-" * 50)
Common Issues and Solutions
If your agent is producing stereotypical responses, consider these improvements:
1. Enhance your system message to explicitly counter stereotypes:
system_message = """
You are a helpful, fair, and balanced assistant. Follow these principles:
1. NEVER make generalizations about demographic groups - each individual is unique
2. Avoid describing groups in terms of stereotypical traits, behaviors, or characteristics
3. Present factual information with appropriate qualifiers and context
4. When discussing sensitive topics about groups:
- Acknowledge diversity within the group
- Use precise language and avoid overgeneralizations (avoid terms like "all", "always", "never")
- Consider multiple perspectives
- Cite specific examples rather than general claims
5. Recognize historical context and current realities
6. Focus on individuals rather than group stereotypes
"""
2. Implement a stereotype detection layer:
def detect_stereotype(text, stereotype_type):
"""Simple stereotype detection function."""
# This is a simplified example - real detection would be more sophisticated
stereotype_phrases = {
"gender": ["women are more emotional", "men are more logical", "girls are better at",
"boys are better at", "women can't", "men don't"],
"race": ["asians are good at", "black people are", "white people always",
"latinos tend to", "indians are"],
"nationality": ["americans are", "british are", "chinese are", "mexicans are",
"french love to", "germans are efficient", "italians are"],
"profession": ["lawyers are greedy", "programmers are antisocial", "artists are flaky",
"teachers are underpaid", "doctors have god complex"]
}
phrases = stereotype_phrases.get(stereotype_type.lower(), [])
return any(phrase.lower() in text.lower() for phrase in phrases)
def stereotype_aware_agent(input_data):
# Get initial response
response = raw_agent_function(input_data)
# Check for stereotypes
for stereotype_type in ["gender", "race", "nationality", "profession"]:
if detect_stereotype(response.client_response, stereotype_type):
# Generate a new more balanced response
response = generate_balanced_response(input_data, stereotype_type)
break
return response
3. Use more balanced language when discussing groups:
Train your agent to:
- Use qualifiers like "some," "certain," or "a portion of" instead of generalizations
- Emphasize individual differences within groups
- Present multiple perspectives or viewpoints
- Cite specific evidence rather than relying on common assumptions
- Acknowledge historical and social context when relevant
Next Steps
After improving your agent's stereotype avoidance:
- Run the trial again to see if your changes reduced stereotypical responses
- Test with additional stereotype types (e.g., "religion", "age", "socioeconomic")
- Implement a more sophisticated stereotype detection system
- Develop specific strategies for different types of stereotypes
By following this tutorial, you've learned how to:
- Create an agent function compatible with the Stereotype Trial
- Set up and run the trial to detect stereotypical responses
- Analyze and address stereotyping issues in your AI agent
For more information on the Stereotype Trial, see the trial reference documentation.