Safety Report

rag-eval

Name: rag-eval
Rating: 5 (2 reviews)
Author: JonathanJing

@JonathanJing

Evaluate your RAG pipeline quality using Ragas metrics (faithfulness, answer relevancy, context precision).

263Downloads

0Installs

2Stars

6Versions

Workflow Automation8,822 Search & Retrieval4,480 Monitoring & Logging3,137 CRM & Sales2,146

Security Analysis

medium confidence

Clean

The skill is internally consistent with its stated purpose (RAG evaluation using Ragas) — it asks for the expected LLM keys, runs ragas via Python, and its instructions match the included scripts — but review and safe installation practices (virtualenv, inspect code) are recommended before running.

Mar 7, 20268 files1 concern

Purpose & Capabilityok

Name/description (RAG evaluation with Ragas) aligns with code and instructions. Declared required binaries (python3, pip), optional env vars (OPENAI/ANTHROPIC/RAGAS_LLM), and the included scripts (run_eval.py, batch_eval.py, setup.sh) are all appropriate for performing LLM-judged RAG evals.

Instruction Scopeok

SKILL.md instructs the agent to accept question/answer/contexts, write a temp JSON file, and call the provided Python scripts; it explicitly warns against shell-injecting user content. The scripts only reference expected files/paths (memory/eval-results) and expected env vars. No instructions request unrelated system data or unrelated credentials.

Install Mechanismnote

No registry install spec is provided; the included scripts/setup.sh installs dependencies via pip from public PyPI packages (ragas, datasets, langchain integrations). This is expected for a Python tool but may modify system Python if a virtualenv isn't used. No downloads from untrusted URLs or URL-shortened installers were found.

Credentialsok

Requested environment access is limited to LLM-related keys and optional RAGAS_* tuning variables. These are justified by the skill's need to call an LLM judge and (optionally) embeddings. No unrelated secrets or multiple unrelated service credentials are requested.

Persistence & Privilegeok

The skill does not request always:true and does not modify other skills. It persists evaluation outputs under memory/eval-results (expected for a reporting tool). The setup script may install packages on the host environment but does not request elevated system privileges.

Guidance

This skill appears to do what it claims, but take these precautions before installing or running it: 1) Inspect the included scripts locally (scripts/run_eval.py, scripts/batch_eval.py, scripts/setup.sh) — don't run arbitrary shell scripts without review. 2) Use a Python virtual environment (python -m venv .venv; source .venv/bin/activate) before running setup.sh to avoid global pip installs. 3) Protect your LLM keys — the skill uses OPENAI_API_KEY/ANTHROPIC_API_KEY to call remote LLMs; grant least-privilege keys where possible and monitor usage. 4) The tool writes evaluation files to memory/eval-results in the working directory; verify this location suits your data-retention policies. 5) There is a truncated/possibly buggy section in the provided run_eval.py excerpt (the sample here was truncated) — ensure you have the complete, reviewed script before running explain/advanced features. 6) Expect runtime costs for LLM judge calls. If you need higher assurance, ask for a line-by-line code review or a reproducible test run in an isolated environment.

Latest Release

v1.2.1

Fixed versioning regression and added simplified installation instructions.

More by @JonathanJing

openclaw-dashboard

3 stars

glass2claw

1 stars

Discrawl Search

@jonathanjing · 0 stars

openclaw-tally

0 stars

ground-control

0 stars

deep-scout

0 stars

Published by @JonathanJing on ClawHub