Trials Overview
AIGauntlet provides a suite of specialized evaluation trials designed to assess different aspects of AI safety and responsibility. Each trial targets specific behavioral patterns that are crucial for responsible AI deployment.
What are Trials?
Trials are structured evaluation protocols that:
- Send carefully crafted inputs to your AI agent
- Analyze responses against defined behavioral expectations
- Generate detailed reports with actionable insights
- Provide visualization tools to interpret results
All trials follow a common interface while specializing in testing specific aspects of AI behavior.
Available Trials
| Trial | Purpose | Key Metrics | Typical Use Cases | | ------------------------------- | -------------------------------------- | ----------------------------------------------------- | ---------------------------------------------- | -------------------------------------------------- | | QuickPrivacy | Privacy protection | Privacy Protection Score, Information Disclosure Rate | Customer support assistants with access to PII | | BiasedEvaluator | Evaluator bias detection | Bias Magnitude, Category Favoritism | Content moderation, hiring systems, credit scoring | | Stereotype | Stereotype detection | Stereotype Rate by Category | Content generation, conversational AI | | SelfHarm | Crisis handling | Resource Provision Rate, Safety Compliance | Mental health, community support | --> |
Common Trial Parameters
All trials share these core parameters:
Parameter | Type | Description |
---|---|---|
email |
str | Your registered email with Actualization.ai |
api_key |
str | Your API key from Actualization.ai |
interact_function |
Callable | Function that wraps your AI agent |
agent_description |
str | Description of what your agent does |
trial_id |
str (optional) | Identifier for tracking this evaluation |
user_notes |
str (optional) | Notes for documentation purposes |
Additionally, each trial has specific parameters related to its testing focus.
How Trials Work
When you run a trial:
- Initialization: The trial validates your credentials and parameters
- Request Generation: Test prompts are generated based on the trial type
- Interaction: Prompts are sent to your agent via your
interact_function
- Analysis: Responses are analyzed against expected behavior patterns
- Reporting: A structured report is created with detailed metrics
- Visualization: Interactive visualizations help interpret the results
Choosing the Right Trial
Select the appropriate trial based on your evaluation needs:
- Privacy Concerns? Use QuickPrivacy Trial to evaluate information protection
- Evaluation Fairness? Use BiasedEvaluator Trial to detect bias in scoring systems
- Content Generation? Use Stereotype Trial to identify stereotypical patterns
- Crisis Management? Use SelfHarm Trial to assess handling of concerning content -->
Integrating Trials into Your Workflow
AIGauntlet trials are designed to integrate with your development pipeline:
- Development: Quickly test behavioral changes during iterative development
- QA Process: Include trials as part of regular quality assurance
- Continuous Integration: Automate trial runs as part of CI/CD
- Pre-deployment: Run comprehensive evaluations before production release
- Monitoring: Periodically evaluate production systems
Customizing Trial Parameters
Each trial offers configuration options to match your specific testing needs:
- Provide context-specific agent descriptions
- Configure specialized parameters for each trial type
- Track evaluations with custom trial IDs
- Add notes for evaluation context
Advanced Usage
For more sophisticated evaluation needs:
- Comparative Testing: Run trials with different agent versions to measure improvements
- Multi-dimensional Evaluation: Use multiple trials to assess different aspects of behavior
- Result Analysis: Analyze raw trial results for custom metrics
Next Steps
Select a specific trial to learn more about its parameters, usage, and interpretation: