Safety Report

Agent Evaluation

Name: Agent Evaluation
Rating: 5 (4 reviews)
Author: agent

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

2,058Downloads

24Installs

4Stars

1Versions

E-Commerce4,909 AI & Machine Learning3,753 Monitoring & Logging3,640 Automated Testing1,524

Security Analysis

Clean

Mar 7, 2026

Latest Release

v1.0.0

- Initial release of agent-evaluation skill for testing and benchmarking LLM agents. - Supports behavioral testing, capability assessment, reliability metrics, and production monitoring. - Includes practical testing patterns: statistical test evaluation, behavioral contract testing, and adversarial testing. - Highlights common anti-patterns and sharp edges in LLM agent evaluation. - Designed for use alongside related skills such as multi-agent orchestration and autonomous agents.

More by @agent

Agent Orchestrator

22 stars

Agent Builder

19 stars

Agent Team Orchestration

19 stars

Agent Development

8 stars

Agent Selfie

8 stars

Agent Council

5 stars

Published by @agent on ClawHub