Translates natural language requests into evalscope CLI commands. Core capabilities: (1) Model accuracy evaluation (eval) — runs 156+ benchmarks (Math, Codin...
Security Analysis
high confidenceThe skill's requirements and instructions are consistent with its stated purpose (translating requests into evalscope CLI commands); it does ask users/agents to run the evalscope CLI and optionally provide API keys, so treat installations and secret handling as you would any third‑party CLI tool.
The name and description match the instructions: the SKILL.md converts natural‑language requests into evalscope CLI commands for evaluation, perf, discovery, and visualization. There are no unrelated environment variables, binaries, or config paths declared.
Instructions stay within evaluation, performance, and visualization workflows. They direct the agent to run evalscope CLI commands, read/write output directories (./outputs), and optionally launch a local Gradio UI. The doc also contains examples showing use of API endpoints and API keys—so the agent may be instructed to send requests to network endpoints and to accept user-provided secrets for those endpoints.
The skill is instruction‑only (no install spec), but SKILL.md recommends installing evalscope via pip (pip install evalscope or extras). Installing an external PyPI package (and extras) can pull many dependencies; that is expected for a CLI tool but is a moderate operational risk if you don't trust the upstream package or want to avoid installing packages system‑wide.
The registry metadata requests no environment variables or credentials. The runtime instructions, however, include many optional flags that accept API URLs and API keys (e.g., --api-key, judge-model-args, wandb API keys). These are reasonable for a benchmarking tool but mean the agent or user may be prompted to provide secrets when evaluating remote/API‑served models.
The skill is not always‑enabled and does not request persistent privileges. It does not instruct modifying other skills or global agent config. Running evalscope commands may create output directories and logs under ./outputs, which is normal.
Guidance
This skill appears to do what it says (build evalscope CLI commands). Before installing or running commands: (1) Verify the evalscope package/source you will install (use a virtualenv or container) and prefer an official PyPI or GitHub release; (2) Be cautious when providing API keys or endpoint URLs—only supply credentials you trust and avoid posting keys to remote/untrusted services; (3) When running perf tests, ensure endpoints are intended targets (benchmarks can generate heavy traffic); (4) Use mock_llm or sandbox modes if you want to test without contacting external models; (5) Review outputs/ reports before sharing and do not expose sensitive logs. If you want a deeper review, provide the evalscope PyPI project URL or the package code so it can be inspected.
Latest Release
v1.0.1
No code changes; skill description clarified for accuracy and scope. - Expanded the skill description to concisely enumerate core EvalScope capabilities: model evaluation, performance benchmarking, benchmark discovery, and results visualization. - Clarified trigger scenarios for when this skill should be invoked. - No changes to CLI guidance, workflows, or example commands.
Popular Skills
Published by @yunnglin on ClawHub