ZappushZappush
SkillsUse CasesBenchmarkCommunitySign In
      Back to Skills
      yunnglin

      Safety Report

      EvalScope

      @yunnglin

      Translates natural language requests into evalscope CLI commands. Core capabilities: (1) Model accuracy evaluation (eval) — runs 156+ benchmarks (Math, Codin...

      169Downloads
      0Installs
      2Stars
      2Versions
      CLI & Shell Tools3,679AI & Machine Learning3,159Translation & i18n3,065Math & Science1,012

      Security Analysis

      high confidence
      Clean0.08 risk

      The skill's requirements and instructions are consistent with its stated purpose (translating requests into evalscope CLI commands); it does ask users/agents to run the evalscope CLI and optionally provide API keys, so treat installations and secret handling as you would any third‑party CLI tool.

      Mar 25, 20264 files2 concerns
      Purpose & Capabilityok

      The name and description match the instructions: the SKILL.md converts natural‑language requests into evalscope CLI commands for evaluation, perf, discovery, and visualization. There are no unrelated environment variables, binaries, or config paths declared.

      Instruction Scopenote

      Instructions stay within evaluation, performance, and visualization workflows. They direct the agent to run evalscope CLI commands, read/write output directories (./outputs), and optionally launch a local Gradio UI. The doc also contains examples showing use of API endpoints and API keys—so the agent may be instructed to send requests to network endpoints and to accept user-provided secrets for those endpoints.

      Install Mechanismnote

      The skill is instruction‑only (no install spec), but SKILL.md recommends installing evalscope via pip (pip install evalscope or extras). Installing an external PyPI package (and extras) can pull many dependencies; that is expected for a CLI tool but is a moderate operational risk if you don't trust the upstream package or want to avoid installing packages system‑wide.

      Credentialsok

      The registry metadata requests no environment variables or credentials. The runtime instructions, however, include many optional flags that accept API URLs and API keys (e.g., --api-key, judge-model-args, wandb API keys). These are reasonable for a benchmarking tool but mean the agent or user may be prompted to provide secrets when evaluating remote/API‑served models.

      Persistence & Privilegeok

      The skill is not always‑enabled and does not request persistent privileges. It does not instruct modifying other skills or global agent config. Running evalscope commands may create output directories and logs under ./outputs, which is normal.

      Guidance

      This skill appears to do what it says (build evalscope CLI commands). Before installing or running commands: (1) Verify the evalscope package/source you will install (use a virtualenv or container) and prefer an official PyPI or GitHub release; (2) Be cautious when providing API keys or endpoint URLs—only supply credentials you trust and avoid posting keys to remote/untrusted services; (3) When running perf tests, ensure endpoints are intended targets (benchmarks can generate heavy traffic); (4) Use mock_llm or sandbox modes if you want to test without contacting external models; (5) Review outputs/ reports before sharing and do not expose sensitive logs. If you want a deeper review, provide the evalscope PyPI project URL or the package code so it can be inspected.

      Latest Release

      v1.0.1

      No code changes; skill description clarified for accuracy and scope. - Expanded the skill description to concisely enumerate core EvalScope capabilities: model evaluation, performance benchmarking, benchmark discovery, and results visualization. - Clarified trigger scenarios for when this skill should be invoked. - No changes to CLI guidance, workflows, or example commands.

      Popular Skills

      self-improving-agent

      @pskoett · 1,456 stars

      Gog

      @steipete · 672 stars

      Tavily Web Search

      @arun-8687 · 620 stars

      Find Skills

      @JimLiuxinghai · 529 stars

      Proactive Agent

      @halthelobster · 426 stars

      Summarize

      @summarize · 415 stars

      Published by @yunnglin on ClawHub

      Zappush© 2026 Zappush
      HomeGuaranteeSupport

      Something feels unusual? We want to help: [email protected]