Safety Report

EvalScope

Name: EvalScope
Rating: 5 (2 reviews)
Author: yunnglin

Translates natural language requests into evalscope CLI commands. Core capabilities: (1) Model accuracy evaluation (eval) — runs 156+ benchmarks (Math, Codin...

169Downloads

0Installs

2Stars

2Versions

CLI & Shell Tools3,679 AI & Machine Learning3,159 Translation & i18n3,065 Math & Science1,012

Security Analysis

high confidence

Clean0.08 risk

The skill's requirements and instructions are consistent with its stated purpose (translating requests into evalscope CLI commands); it does ask users/agents to run the evalscope CLI and optionally provide API keys, so treat installations and secret handling as you would any third‑party CLI tool.

Mar 25, 20264 files2 concerns

Purpose & Capabilityok

The name and description match the instructions: the SKILL.md converts natural‑language requests into evalscope CLI commands for evaluation, perf, discovery, and visualization. There are no unrelated environment variables, binaries, or config paths declared.

Instruction Scopenote

Instructions stay within evaluation, performance, and visualization workflows. They direct the agent to run evalscope CLI commands, read/write output directories (./outputs), and optionally launch a local Gradio UI. The doc also contains examples showing use of API endpoints and API keys—so the agent may be instructed to send requests to network endpoints and to accept user-provided secrets for those endpoints.

Install Mechanismnote

The skill is instruction‑only (no install spec), but SKILL.md recommends installing evalscope via pip (pip install evalscope or extras). Installing an external PyPI package (and extras) can pull many dependencies; that is expected for a CLI tool but is a moderate operational risk if you don't trust the upstream package or want to avoid installing packages system‑wide.

Credentialsok

The registry metadata requests no environment variables or credentials. The runtime instructions, however, include many optional flags that accept API URLs and API keys (e.g., --api-key, judge-model-args, wandb API keys). These are reasonable for a benchmarking tool but mean the agent or user may be prompted to provide secrets when evaluating remote/API‑served models.

Persistence & Privilegeok

The skill is not always‑enabled and does not request persistent privileges. It does not instruct modifying other skills or global agent config. Running evalscope commands may create output directories and logs under ./outputs, which is normal.

Guidance

This skill appears to do what it says (build evalscope CLI commands). Before installing or running commands: (1) Verify the evalscope package/source you will install (use a virtualenv or container) and prefer an official PyPI or GitHub release; (2) Be cautious when providing API keys or endpoint URLs—only supply credentials you trust and avoid posting keys to remote/untrusted services; (3) When running perf tests, ensure endpoints are intended targets (benchmarks can generate heavy traffic); (4) Use mock_llm or sandbox modes if you want to test without contacting external models; (5) Review outputs/ reports before sharing and do not expose sensitive logs. If you want a deeper review, provide the evalscope PyPI project URL or the package code so it can be inspected.

Latest Release

v1.0.1

No code changes; skill description clarified for accuracy and scope. - Expanded the skill description to concisely enumerate core EvalScope capabilities: model evaluation, performance benchmarking, benchmark discovery, and results visualization. - Clarified trigger scenarios for when this skill should be invoked. - No changes to CLI guidance, workflows, or example commands.

Popular Skills

self-improving-agent

@pskoett · 1,456 stars

Gog

@steipete · 672 stars

Tavily Web Search

@arun-8687 · 620 stars

Find Skills

@JimLiuxinghai · 529 stars

Proactive Agent

@halthelobster · 426 stars

Summarize

@summarize · 415 stars

Published by @yunnglin on ClawHub