Trials Overview

AIGauntlet provides a suite of specialized evaluation trials designed to assess different aspects of AI safety and responsibility. Each trial targets specific behavioral patterns that are crucial for responsible AI deployment.

What are Trials?

Trials are structured evaluation protocols that:

Send carefully crafted inputs to your AI agent
Analyze responses against defined behavioral expectations
Generate detailed reports with actionable insights
Provide visualization tools to interpret results

All trials follow a common interface while specializing in testing specific aspects of AI behavior.

Available Trials

| Trial | Purpose | Key Metrics | Typical Use Cases | | ------------------------------- | -------------------------------------- | ----------------------------------------------------- | ---------------------------------------------- | -------------------------------------------------- | | QuickPrivacy | Privacy protection | Privacy Protection Score, Information Disclosure Rate | Customer support assistants with access to PII | | BiasedEvaluator | Evaluator bias detection | Bias Magnitude, Category Favoritism | Content moderation, hiring systems, credit scoring | | Stereotype | Stereotype detection | Stereotype Rate by Category | Content generation, conversational AI | | SelfHarm | Crisis handling | Resource Provision Rate, Safety Compliance | Mental health, community support | --> |

Common Trial Parameters

All trials share these core parameters:

Parameter	Type	Description
`email`	str	Your registered email with Actualization.ai
`api_key`	str	Your API key from Actualization.ai
`interact_function`	Callable	Function that wraps your AI agent
`agent_description`	str	Description of what your agent does
`trial_id`	str (optional)	Identifier for tracking this evaluation
`user_notes`	str (optional)	Notes for documentation purposes

Additionally, each trial has specific parameters related to its testing focus.

How Trials Work

When you run a trial:

Initialization: The trial validates your credentials and parameters
Request Generation: Test prompts are generated based on the trial type
Interaction: Prompts are sent to your agent via your interact_function
Analysis: Responses are analyzed against expected behavior patterns
Reporting: A structured report is created with detailed metrics
Visualization: Interactive visualizations help interpret the results

Choosing the Right Trial

Select the appropriate trial based on your evaluation needs:

Privacy Concerns? Use QuickPrivacy Trial to evaluate information protection
Evaluation Fairness? Use BiasedEvaluator Trial to detect bias in scoring systems
Content Generation? Use Stereotype Trial to identify stereotypical patterns
Crisis Management? Use SelfHarm Trial to assess handling of concerning content -->

Integrating Trials into Your Workflow

AIGauntlet trials are designed to integrate with your development pipeline:

Development: Quickly test behavioral changes during iterative development
QA Process: Include trials as part of regular quality assurance
Continuous Integration: Automate trial runs as part of CI/CD
Pre-deployment: Run comprehensive evaluations before production release
Monitoring: Periodically evaluate production systems

Customizing Trial Parameters

Each trial offers configuration options to match your specific testing needs:

Provide context-specific agent descriptions
Configure specialized parameters for each trial type
Track evaluations with custom trial IDs
Add notes for evaluation context

Advanced Usage

For more sophisticated evaluation needs:

Comparative Testing: Run trials with different agent versions to measure improvements
Multi-dimensional Evaluation: Use multiple trials to assess different aspects of behavior
Result Analysis: Analyze raw trial results for custom metrics

Next Steps

Select a specific trial to learn more about its parameters, usage, and interpretation: