Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
Security Analysis
high confidenceThe skill's README/SKILL.md claim 'zero dependencies' and OCR support, but the package files and code contradict that (require pdfjs-dist, no Tesseract implementation), and there are runtime/API bugs — the bundle is internally inconsistent and needs clarification/fixes before use.
The manifest and SKILL.md claim 'zero dependencies' and OCR via Tesseract.js, but package.json declares pdfjs-dist as a dependency and package-lock.json lists many packages; Tesseract.js is not present. The code dynamically requires 'pdfjs-dist' (it will error if not installed), so the 'zero dependencies' claim is false. OCR is advertised but index.js contains no Tesseract integration or OCR fallback — it always attempts text-layer extraction. These mismatches show the claimed capability does not align with the actual required components.
SKILL.md and README instruct using extractText with ocr:true and show examples assuming synchronous or fully-implemented OCR behavior. The runtime index.js does not implement OCR (no Tesseract usage) and also contains API misuse/bugs: extractText calls countWords(fullText) where countWords expects an object {text, options}, which will cause a runtime error; test.js treats extractText's Promise result as synchronous in places. The instructions therefore do not reflect what the code actually does and grant no clear guidance for installing required dependencies.
There is no install spec in the registry, but package.json and package-lock.json are included and declare pdfjs-dist and many nested packages. README suggests running 'npm install pdfjs-dist'. The absence of an install spec combined with packaged dependency manifests is inconsistent with the 'zero dependencies' marketing and means a user may need to run npm install (which pulls many packages and optional native build scripts). That increases friction and risk compared to the claimed zero-dependency design.
The skill does not request any environment variables, credentials, or config paths. Nothing in the files reads external secrets or unrelated system config.
Flags show the skill is not always-enabled and uses the default model-invocation behavior. It does not request persistent system-wide privileges or modify other skills' configuration.
Guidance
This package is internally inconsistent: it advertises 'zero dependencies' and OCR support but includes package.json requiring pdfjs-dist and does not implement Tesseract OCR. Before installing or using it, consider: (1) Do not feed sensitive PDFs to an untrusted/unclear package. (2) Inspect package.json/package-lock and run npm install in an isolated sandbox if you want to test; optional native builds (canvas, etc.) may run build scripts. (3) Ask the author to clarify and fix: (a) add or remove OCR support and include Tesseract if intended, (b) correct the countWords API misuse (extractText currently calls countWords incorrectly), and (c) provide a proper install spec or update documentation to match real dependencies. (4) If you need reliable OCR now, use a maintained library with clear dependency docs. If you decide to test this skill, do it in an isolated environment and review the code changes and installed packages first.
Latest Release
v1.0.0
Initial release: Extract text from PDFs with OCR support for digitizing documents
Popular Skills
Published by @Michael-laffin on ClawHub