Use this skill when users need to extract text from images, PDFs, or documents. Supports URLs and local files. Returns structured JSON containing recognized...
Security Analysis
high confidenceThe skill's files, runtime instructions, and required credentials align with an OCR wrapper that calls a PaddleOCR-style HTTP API — nothing in the package indicates it is doing unrelated or hidden work.
Name/description match the code and declared requirements. Requested binaries (python) and env vars (API URL, access token, timeout) are exactly what's needed for a networked OCR client. Required items are proportionate to the stated purpose.
The SKILL.md restricts the agent to call the included Python CLI (ocr_caller.py) and explicitly forbids the agent from 'reading images directly' or trying alternate OCR methods; that is consistent with the provided scripts which perform HTTP POSTs to the configured API and, when given a local file path, encode the file as base64 and send it to the API. Two operational notes: (1) the caller auto-saves full raw JSON results to a temp-file path by default and the SKILL.md instructs the agent to read that file before responding — this is normal but means intermediate results are persisted to disk; (2) the SKILL.md mandates always displaying the COMPLETE recognized text (no truncation), which is a policy/privacy decision and may expose sensitive contents from images (this is not a technical inconsistency but is important for users to consider).
No install spec; the skill is script-based and depends on standard Python packages (httpx, python-dotenv) listed in requirements.txt. There are no external downloads, unknown release URLs, or archive extraction steps. Risk from installation is low.
The skill only requires PADDLEOCR_OCR_API_URL, PADDLEOCR_ACCESS_TOKEN, and an optional timeout; the primary credential is the access token which is appropriate. Minor caution: configure.py writes (and may overwrite) a local .env in the project root (Path(__file__).parent.parent.parent/.env) and attempts to preserve other keys; if run in a host application directory this could modify a shared .env — the SKILL.md does warn about that. The code reads environment variables (via dotenv if present) consistent with the declared requirements and does not attempt to access unrelated credentials.
always:false and no special privileges are requested. The skill persists OCR results to files under the system temp directory by design (ocr_caller.py) and can also print to stdout. It does not modify other skills or system-wide agent settings. The only persistent write behavior is optional configuration (.env) and result JSON files under the OS temp dir.
Guidance
This skill appears to do exactly what it says: send images or PDFs to a configurable PaddleOCR-style HTTP endpoint and return the provider's OCR result. Before installing: 1) Ensure the PADDLEOCR_OCR_API_URL points to a legitimate/trusted service (don’t use unknown or shortened URLs). 2) Treat PADDLEOCR_ACCESS_TOKEN like any secret — only provide a token with minimal scope and rotate it if unsure. 3) Be aware that OCR results are saved by default to the system temp directory (full raw JSON including recognized text and any returned image URLs) and the SKILL.md requires displaying the complete recognized text to users — this can leak sensitive information from images, so avoid running the skill on confidential images unless you accept that disclosure. 4) If you run configure.py, it will write a .env file in the skill project root and may overwrite or preserve other keys; do not run it inside a host application's config directory unless you intend to modify that .env. 5) Review the configured endpoint’s privacy/security practices since image bytes (or image URLs) and derived text are sent over the network. If any of these points are unacceptable, do not install or use the skill until you have a trusted endpoint and appropriate operational controls.
Latest Release
v1.0.5
Version 1.0.5 - Updated configuration instructions to recommend secure credential setup via the host application, rather than pasting credentials in chat. - Added explicit security warning if credentials are provided in chat, highlighting that such information may be stored in conversation history. - Clarified environment variable setup steps and emphasized secure configuration. - No functional changes to the skill’s OCR handling or results workflow.
More by @Bobholamovic
Published by @Bobholamovic on ClawHub