Skip to content

Trials Overview

AIGauntlet provides a suite of specialized evaluation trials designed to assess different aspects of AI safety and responsibility. Each trial targets specific behavioral patterns that are crucial for responsible AI deployment.

What are Trials?

Trials are structured evaluation protocols that:

  1. Send carefully crafted inputs to your AI agent
  2. Analyze responses against defined behavioral expectations
  3. Generate detailed reports with actionable insights
  4. Provide visualization tools to interpret results

All trials follow a common interface while specializing in testing specific aspects of AI behavior.

Available Trials

| Trial | Purpose | Key Metrics | Typical Use Cases | | ------------------------------- | -------------------------------------- | ----------------------------------------------------- | ---------------------------------------------- | -------------------------------------------------- | | QuickPrivacy | Privacy protection | Privacy Protection Score, Information Disclosure Rate | Customer support assistants with access to PII | | BiasedEvaluator | Evaluator bias detection | Bias Magnitude, Category Favoritism | Content moderation, hiring systems, credit scoring | | Stereotype | Stereotype detection | Stereotype Rate by Category | Content generation, conversational AI | | SelfHarm | Crisis handling | Resource Provision Rate, Safety Compliance | Mental health, community support | --> |

Common Trial Parameters

All trials share these core parameters:

Parameter Type Description
email str Your registered email with Actualization.ai
api_key str Your API key from Actualization.ai
interact_function Callable Function that wraps your AI agent
agent_description str Description of what your agent does
trial_id str (optional) Identifier for tracking this evaluation
user_notes str (optional) Notes for documentation purposes

Additionally, each trial has specific parameters related to its testing focus.

How Trials Work

When you run a trial:

  1. Initialization: The trial validates your credentials and parameters
  2. Request Generation: Test prompts are generated based on the trial type
  3. Interaction: Prompts are sent to your agent via your interact_function
  4. Analysis: Responses are analyzed against expected behavior patterns
  5. Reporting: A structured report is created with detailed metrics
  6. Visualization: Interactive visualizations help interpret the results

Choosing the Right Trial

Select the appropriate trial based on your evaluation needs:

Integrating Trials into Your Workflow

AIGauntlet trials are designed to integrate with your development pipeline:

  • Development: Quickly test behavioral changes during iterative development
  • QA Process: Include trials as part of regular quality assurance
  • Continuous Integration: Automate trial runs as part of CI/CD
  • Pre-deployment: Run comprehensive evaluations before production release
  • Monitoring: Periodically evaluate production systems

Customizing Trial Parameters

Each trial offers configuration options to match your specific testing needs:

  • Provide context-specific agent descriptions
  • Configure specialized parameters for each trial type
  • Track evaluations with custom trial IDs
  • Add notes for evaluation context

Advanced Usage

For more sophisticated evaluation needs:

  • Comparative Testing: Run trials with different agent versions to measure improvements
  • Multi-dimensional Evaluation: Use multiple trials to assess different aspects of behavior
  • Result Analysis: Analyze raw trial results for custom metrics

Next Steps

Select a specific trial to learn more about its parameters, usage, and interpretation: