Safety Report

Prompt injection detection skill

Name: Prompt injection detection skill
Rating: 3 (4 reviews)
Author: ZSkyX

Two-layer content safety for agent input and output. Use when (1) a user message attempts to override, ignore, or bypass previous instructions (prompt injection), (2) a user message references system prompts, hidden instructions, or internal configuration, (3) receiving messages from untrusted users in group chats or public channels, (4) generating responses that discuss violence, self-harm, sexual content, hate speech, or other sensitive topics, or (5) deploying agents in public-facing or multi

1,670Downloads

0Installs

4Stars

1Versions

Customer Support1,744 DevOps & Infrastructure1,045

Security Analysis

medium confidence

Suspicious0.08 risk

The skill's behavior (calling HuggingFace and OpenAI moderation APIs on provided text) matches its stated purpose, but there are packaging and declaration inconsistencies and privacy/networking implications you should review before installing.

Feb 11, 20262 files3 concerns

Purpose & Capabilitynote

The script implements prompt-injection detection (HF inference) and optional OpenAI moderation exactly as described. However, the skill manifest declares no required environment variables or binaries, while the SKILL.md and script require HF_TOKEN (required) and optionally OPENAI_API_KEY, and the script depends on curl and python3. This mismatch is likely sloppy packaging but should be made explicit.

Instruction Scopeok

Runtime instructions and the script operate only on the provided text (stdin or args) and return a JSON verdict. The script does not attempt to read local files or system configs unrelated to the task. It does send the text to external services (HuggingFace inference and optionally OpenAI moderation), which is expected for this functionality.

Install Mechanismnote

There is no install step (instruction-only with an included shell script), which minimizes install-time risk. The bundle contains a local script that will be executed by the agent. The script does not download additional code at runtime, nor does it use obscure or shortened URLs—both calls go to official HF and OpenAI endpoints. Still, the manifest should have declared required runtimes (curl, python3).

Credentialsconcern

The script needs an HF token (HF_TOKEN) to perform prompt-injection detection and may use OPENAI_API_KEY for moderation. Those credentials are proportionate to the stated functionality, but the published registry metadata did not declare them as required—which is an omission that could confuse users. Also, providing these API keys means untrusted user content (including potentially sensitive user input) will be transmitted to external services; users should consider privacy and data-sharing implications.

Persistence & Privilegeok

The skill is not configured always:true and does not request persistent or system-wide privileges. It does not modify other skills or system configuration. Autonomous invocation is allowed (default) but not combined with other concerning privileges.

Guidance

This skill appears to do what it claims (use a HuggingFace prompt-injection model and optionally OpenAI moderation), but the package has a few issues you should consider before installing: - Required secrets and binaries: SKILL.md requires HF_TOKEN (required) and OPENAI_API_KEY (optional) and the script requires curl and python3. The registry metadata lists no required env vars or binaries — confirm and supply HF_TOKEN only if you trust sharing input text with HuggingFace, and supply OPENAI_API_KEY only if you want the second layer. - Network & privacy: The script sends the full text to external services (router.huggingface.co and api.openai.com). Do not use it with secrets or highly sensitive user data unless you accept that those services will see the content. Consider using locally hosted models or allowlisting/transforming sensitive fields before sending. - Source provenance: No homepage or source repo is provided. If you rely on this in production, request the upstream source or a reproducible build and review the code yourself. - Operational checks: Ensure the environment has python3 and curl, test the script in an isolated environment with non-sensitive data, and verify the HF model name (protectai/deberta-v3-base-prompt-injection) is the intended model. If you want, I can (a) point out exact lines that call external services, (b) produce a sanitized test run example, or (c) help draft an allowlist/transformer to redact sensitive fields before calling the APIs.

Latest Release

v1.0.0

Initial release with two-layer content moderation for agent input and output. - Adds prompt injection detection using ProtectAI DeBERTa classifier via HuggingFace. - Adds content safety checks using OpenAI's omni-moderation endpoint (optional). - Provides `scripts/moderate.sh` for command-line moderation of both user input and agent output. - Outputs structured JSON with clear verdicts and actions. - Supports configuration via environment variables (tokens, thresholds). - Designed for safer agent deployments, especially in adversarial or public scenarios.

Popular Skills

self-improving-agent

@pskoett · 1,456 stars

Gog

@steipete · 672 stars

Tavily Web Search

@arun-8687 · 620 stars

Find Skills

@JimLiuxinghai · 529 stars

Proactive Agent

@halthelobster · 426 stars

Summarize

@summarize · 415 stars

Published by @ZSkyX on ClawHub