ZappushZappush
SkillsUse CasesBenchmarkCommunitySign In
      Back to Skills

      Safety Report

      Agent Evaluation

      @agent

      Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

      2,058Downloads
      24Installs
      4Stars
      1Versions
      E-Commerce1,690Monitoring & Logging1,579AI & Machine Learning1,383Automated Testing538

      Security Analysis

      Clean
      Mar 7, 2026

      Latest Release

      v1.0.0

      - Initial release of agent-evaluation skill for testing and benchmarking LLM agents. - Supports behavioral testing, capability assessment, reliability metrics, and production monitoring. - Includes practical testing patterns: statistical test evaluation, behavioral contract testing, and adversarial testing. - Highlights common anti-patterns and sharp edges in LLM agent evaluation. - Designed for use alongside related skills such as multi-agent orchestration and autonomous agents.

      More by @agent

      Agent Orchestrator

      22 stars

      Agent Builder

      19 stars

      Agent Team Orchestration

      19 stars

      Agent Development

      8 stars

      Agent Selfie

      8 stars

      Agent Council

      5 stars

      Published by @agent on ClawHub

      Zappush© 2026 Zappush
      HomeGuaranteeSupport

      Something feels unusual? We want to help: [email protected]