Safety Report

PDF Text Extractor

Name: PDF Text Extractor
Rating: 3 (12 reviews)
Author: Michael-laffin

Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.

6,857Downloads

63Installs

12Stars

1Versions

PDF & Documents4,383 Customer Support4,248 Finance & Accounting3,023 Web Scraping2,627

Security Analysis

high confidence

Suspicious

The skill's README/SKILL.md claim 'zero dependencies' and OCR support, but the package files and code contradict that (require pdfjs-dist, no Tesseract implementation), and there are runtime/API bugs — the bundle is internally inconsistent and needs clarification/fixes before use.

Feb 11, 20267 files3 concerns

Purpose & Capabilityconcern

The manifest and SKILL.md claim 'zero dependencies' and OCR via Tesseract.js, but package.json declares pdfjs-dist as a dependency and package-lock.json lists many packages; Tesseract.js is not present. The code dynamically requires 'pdfjs-dist' (it will error if not installed), so the 'zero dependencies' claim is false. OCR is advertised but index.js contains no Tesseract integration or OCR fallback — it always attempts text-layer extraction. These mismatches show the claimed capability does not align with the actual required components.

Instruction Scopeconcern

SKILL.md and README instruct using extractText with ocr:true and show examples assuming synchronous or fully-implemented OCR behavior. The runtime index.js does not implement OCR (no Tesseract usage) and also contains API misuse/bugs: extractText calls countWords(fullText) where countWords expects an object {text, options}, which will cause a runtime error; test.js treats extractText's Promise result as synchronous in places. The instructions therefore do not reflect what the code actually does and grant no clear guidance for installing required dependencies.

Install Mechanismconcern

There is no install spec in the registry, but package.json and package-lock.json are included and declare pdfjs-dist and many nested packages. README suggests running 'npm install pdfjs-dist'. The absence of an install spec combined with packaged dependency manifests is inconsistent with the 'zero dependencies' marketing and means a user may need to run npm install (which pulls many packages and optional native build scripts). That increases friction and risk compared to the claimed zero-dependency design.

Credentialsok

The skill does not request any environment variables, credentials, or config paths. Nothing in the files reads external secrets or unrelated system config.

Persistence & Privilegeok

Flags show the skill is not always-enabled and uses the default model-invocation behavior. It does not request persistent system-wide privileges or modify other skills' configuration.

Guidance

This package is internally inconsistent: it advertises 'zero dependencies' and OCR support but includes package.json requiring pdfjs-dist and does not implement Tesseract OCR. Before installing or using it, consider: (1) Do not feed sensitive PDFs to an untrusted/unclear package. (2) Inspect package.json/package-lock and run npm install in an isolated sandbox if you want to test; optional native builds (canvas, etc.) may run build scripts. (3) Ask the author to clarify and fix: (a) add or remove OCR support and include Tesseract if intended, (b) correct the countWords API misuse (extractText currently calls countWords incorrectly), and (c) provide a proper install spec or update documentation to match real dependencies. (4) If you need reliable OCR now, use a maintained library with clear dependency docs. If you decide to test this skill, do it in an isolated environment and review the code changes and installed packages first.

Latest Release

v1.0.0

Initial release: Extract text from PDFs with OCR support for digitizing documents

Popular Skills

Statamic Development

@michael-stokoe · 1 stars

Statamic AI Gateway

@michael-stokoe · 1 stars

Deep Search-mpro

@muqi98-michael · 1 stars

obsidian-notesmd-cli-command

@michael-c-matias · 0 stars

deprecated ignore

@michael-stajer · 0 stars

Enrich Layer

@nicest-michael · 0 stars

Published by @Michael-laffin on ClawHub