ZappushZappush
SkillsUse CasesBenchmarkCommunitySign In
      Back to Skills
      JonathanJing

      Safety Report

      rag-eval

      @JonathanJing

      Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision).

      263Downloads
      0Installs
      2Stars
      6Versions
      Workflow Automation3,323Search & Retrieval2,116Monitoring & Logging1,579Networking & DNS1,102

      Security Analysis

      medium confidence
      Clean

      The skill is internally consistent with its stated purpose (RAG evaluation using Ragas) — it asks for the expected LLM keys, runs ragas via Python, and its instructions match the included scripts — but review and safe installation practices (virtualenv, inspect code) are recommended before running.

      Mar 7, 20268 files1 concern
      Purpose & Capabilityok

      Name/description (RAG evaluation with Ragas) aligns with code and instructions. Declared required binaries (python3, pip), optional env vars (OPENAI/ANTHROPIC/RAGAS_LLM), and the included scripts (run_eval.py, batch_eval.py, setup.sh) are all appropriate for performing LLM-judged RAG evals.

      Instruction Scopeok

      SKILL.md instructs the agent to accept question/answer/contexts, write a temp JSON file, and call the provided Python scripts; it explicitly warns against shell-injecting user content. The scripts only reference expected files/paths (memory/eval-results) and expected env vars. No instructions request unrelated system data or unrelated credentials.

      Install Mechanismnote

      No registry install spec is provided; the included scripts/setup.sh installs dependencies via pip from public PyPI packages (ragas, datasets, langchain integrations). This is expected for a Python tool but may modify system Python if a virtualenv isn't used. No downloads from untrusted URLs or URL-shortened installers were found.

      Credentialsok

      Requested environment access is limited to LLM-related keys and optional RAGAS_* tuning variables. These are justified by the skill's need to call an LLM judge and (optionally) embeddings. No unrelated secrets or multiple unrelated service credentials are requested.

      Persistence & Privilegeok

      The skill does not request always:true and does not modify other skills. It persists evaluation outputs under memory/eval-results (expected for a reporting tool). The setup script may install packages on the host environment but does not request elevated system privileges.

      Guidance

      This skill appears to do what it claims, but take these precautions before installing or running it: 1) Inspect the included scripts locally (scripts/run_eval.py, scripts/batch_eval.py, scripts/setup.sh) — don't run arbitrary shell scripts without review. 2) Use a Python virtual environment (python -m venv .venv; source .venv/bin/activate) before running setup.sh to avoid global pip installs. 3) Protect your LLM keys — the skill uses OPENAI_API_KEY/ANTHROPIC_API_KEY to call remote LLMs; grant least-privilege keys where possible and monitor usage. 4) The tool writes evaluation files to memory/eval-results in the working directory; verify this location suits your data-retention policies. 5) There is a truncated/possibly buggy section in the provided run_eval.py excerpt (the sample here was truncated) — ensure you have the complete, reviewed script before running explain/advanced features. 6) Expect runtime costs for LLM judge calls. If you need higher assurance, ask for a line-by-line code review or a reproducible test run in an isolated environment.

      Latest Release

      v1.2.1

      Fixed versioning regression and added simplified installation instructions.

      More by @JonathanJing

      openclaw-dashboard

      3 stars

      glass2claw

      1 stars

      Token Ledger (SQLite)

      0 stars

      deep-scout

      0 stars

      openclaw-tally

      0 stars

      ground-control

      0 stars

      Published by @JonathanJing on ClawHub

      Zappush© 2026 Zappush
      HomeGuaranteeSupport

      Something feels unusual? We want to help: [email protected]