Testing for Stereotypes with the Stereotype Trial

This tutorial walks you through using the Stereotype Trial to test whether your AI agent produces stereotypical responses based on demographic characteristics.

Prerequisites

Before starting, make sure you have:

Installed AIGauntlet (pip install aigauntlet)
An API key from Actualization.ai
An AI agent that can respond to conversational prompts

Step 1: Set Up Your Environment

First, let's set up your environment and import the necessary dependencies:

import os
import openai
from dotenv import load_dotenv
from aigauntlet import StereotypeTrial
from aigauntlet.StereotypeTrial import StereotypeTrial_InputFormat, StereotypeTrial_OutputFormat

# Load environment variables from .env file
load_dotenv()

# Configure your API keys
ACTUALIZATION_API_KEY = os.getenv("ACTUALIZATION_API_KEY")
ACTUALIZATION_EMAIL = os.getenv("ACTUALIZATION_EMAIL")
openai.api_key = os.getenv("OPENAI_API_KEY")

# Verify credentials are loaded
if not all([ACTUALIZATION_API_KEY, ACTUALIZATION_EMAIL, openai.api_key]):
    raise ValueError("Missing required API credentials. Check your .env file.")

Step 2: Create an Agent Function

Create a function that wraps your AI agent. This function must:

Accept a StereotypeTrial_InputFormat object
Return a StereotypeTrial_OutputFormat object

Here's an example using OpenAI's API:

def my_agent_function(input_data: StereotypeTrial_InputFormat) -> StereotypeTrial_OutputFormat:
    """Handle a prompt and return a response using OpenAI's API."""

    # Extract the prompt from the input format
    prompt = input_data.prompt

    # Define a system message for your agent
    system_message = """
    You are a helpful, respectful, and balanced assistant.
    When discussing different demographic groups, be careful to:
    - Avoid making generalizations about groups
    - Recognize diversity within all groups
    - Base your responses on factual information, not stereotypes
    - Present balanced viewpoints
    - Acknowledge societal context when relevant
    """

    # Call the OpenAI API
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=500
    )

    # Extract the response text
    response_text = response.choices[0].message.content

    # Return in the expected output format
    return StereotypeTrial_OutputFormat(client_response=response_text)

Step 3: Create and Run the Trial

Now let's create and run the trial with your agent function:

# Create the trial
trial = StereotypeTrial(
    email=ACTUALIZATION_EMAIL,
    api_key=ACTUALIZATION_API_KEY,
    interact_function=my_agent_function,
    agent_description="General purpose conversational AI assistant",
    stereotype_types=["gender", "race", "nationality", "profession"],
    trial_id="stereotype-tutorial",  # Optional identifier
    user_notes="Testing our agent for stereotypical responses"  # Optional notes
)

# Run the trial
print("Running Stereotype trial...")
report = trial.run()
print("Trial completed!")

Step 4: Analyze the Results

After the trial completes, you can analyze the results:

# Get the results grouped by stereotype type
results = report.get_results()

# Print summary statistics for each stereotype type
for stereotype_type, tests in results.items():
    print(f"\n--- {stereotype_type.capitalize()} Stereotype Analysis ---")

    # Count stereotypical responses
    stereotypical = sum(1 for test in tests if test["success_code"] == "NONCOMPLIANT")
    total_tests = len(tests)
    stereotype_rate = stereotypical / total_tests if total_tests > 0 else 0

    print(f"Tests detecting stereotypes: {stereotypical}/{total_tests} ({stereotype_rate*100:.1f}%)")

Step 5: Review Problematic Responses

To improve your agent, it's helpful to review specific cases where stereotypical responses were detected:

# Find and print examples of stereotypical responses
for stereotype_type, tests in results.items():
    stereotypical_tests = [test for test in tests if test["success_code"] == "NONCOMPLIANT"]

    if not stereotypical_tests:
        continue

    print(f"\n=== Stereotypical {stereotype_type.capitalize()} Responses ===")

    for i, test in enumerate(stereotypical_tests[:3]):  # Show up to 3 examples per type
        print(f"\nExample {i+1}:")
        print(f"Prompt: {test.get('extra', {}).get('prompt', 'No prompt available')}")
        print(f"\nResponse: {test.get('extra', {}).get('response', 'No response available')}\n")

        # If available, show what specific stereotype was detected
        stereotype_details = test.get('extra', {}).get('stereotype_details')
        if stereotype_details:
            print(f"Detected stereotype: {stereotype_details}")

        print("-" * 50)

Common Issues and Solutions

If your agent is producing stereotypical responses, consider these improvements:

1. Enhance your system message to explicitly counter stereotypes:

system_message = """
You are a helpful, fair, and balanced assistant. Follow these principles:

1. NEVER make generalizations about demographic groups - each individual is unique
2. Avoid describing groups in terms of stereotypical traits, behaviors, or characteristics
3. Present factual information with appropriate qualifiers and context
4. When discussing sensitive topics about groups:
   - Acknowledge diversity within the group
   - Use precise language and avoid overgeneralizations (avoid terms like "all", "always", "never")
   - Consider multiple perspectives
   - Cite specific examples rather than general claims
5. Recognize historical context and current realities
6. Focus on individuals rather than group stereotypes
"""

2. Implement a stereotype detection layer:

def detect_stereotype(text, stereotype_type):
    """Simple stereotype detection function."""
    # This is a simplified example - real detection would be more sophisticated
    stereotype_phrases = {
        "gender": ["women are more emotional", "men are more logical", "girls are better at",
                  "boys are better at", "women can't", "men don't"],
        "race": ["asians are good at", "black people are", "white people always",
                "latinos tend to", "indians are"],
        "nationality": ["americans are", "british are", "chinese are", "mexicans are",
                       "french love to", "germans are efficient", "italians are"],
        "profession": ["lawyers are greedy", "programmers are antisocial", "artists are flaky",
                     "teachers are underpaid", "doctors have god complex"]
    }

    phrases = stereotype_phrases.get(stereotype_type.lower(), [])
    return any(phrase.lower() in text.lower() for phrase in phrases)

def stereotype_aware_agent(input_data):
    # Get initial response
    response = raw_agent_function(input_data)

    # Check for stereotypes
    for stereotype_type in ["gender", "race", "nationality", "profession"]:
        if detect_stereotype(response.client_response, stereotype_type):
            # Generate a new more balanced response
            response = generate_balanced_response(input_data, stereotype_type)
            break

    return response

3. Use more balanced language when discussing groups:

Train your agent to:

Use qualifiers like "some," "certain," or "a portion of" instead of generalizations
Emphasize individual differences within groups
Present multiple perspectives or viewpoints
Cite specific evidence rather than relying on common assumptions
Acknowledge historical and social context when relevant

Next Steps

After improving your agent's stereotype avoidance:

Run the trial again to see if your changes reduced stereotypical responses
Test with additional stereotype types (e.g., "religion", "age", "socioeconomic")
Implement a more sophisticated stereotype detection system
Develop specific strategies for different types of stereotypes

By following this tutorial, you've learned how to:

Create an agent function compatible with the Stereotype Trial
Set up and run the trial to detect stereotypical responses
Analyze and address stereotyping issues in your AI agent

For more information on the Stereotype Trial, see the trial reference documentation.