Use when the user sends a video file or asks about video content. Extracts frames and injects them as an image grid directly into the LLM context — no proxy...
Security Analysis
high confidenceThe skill's code, runtime instructions, and install requirements are consistent with its stated purpose (extracting frames and injecting a grid image into the LLM context); nothing requested is disproportionate or unexplained.
Name/description require ffmpeg/node and the packaged script uses ffmpeg (via the llm-frames npm library) to extract frames and produce a JPEG grid — the declared binaries and npm dependency align with this purpose. No unrelated credentials or unusual tools are requested.
SKILL.md instructs running the provided node script, parsing its JSON output, and using the platform 'read' tool to inject the produced jpg. The script only reads the provided video file, checks its size, extracts frames, writes a single grid JPEG to the system tmpdir, and emits metadata — it does not access other files, environment variables, or external endpoints.
Install is standard: npm install (pulls llm-frames from public npm) and an optional brew/apt ffmpeg install. This is expected for the task but carries normal supply-chain risk from an npm dependency; the package-lock includes an integrity hash for llm-frames.
No environment variables, secrets, or external credentials are requested. The skill does not require unrelated permissions or configuration paths.
always:false and the skill does not attempt to modify other skills or global agent settings. It writes ephemeral output to the OS tmpdir (one JPEG per run) and exits; no background services or persistent privileges are requested.
Guidance
This skill appears to do exactly what it says: it needs node and ffmpeg, runs a local script that extracts frames, writes a grid image to /tmp, and outputs JSON for injection into a vision-capable model. Before installing: 1) be aware npm install will fetch the llm-frames package from the public registry — review that package (and the integrity hash in package-lock.json) if you have supply-chain concerns; 2) the grid image is written to the system tmpdir and may be readable by other local users on shared systems — delete sensitive files after use; 3) the README mentions future audio transcription, but the included code does not perform network calls or transcription today; 4) run the skill in an isolated environment if you will process highly sensitive video; and 5) ensure your model and platform correctly handle injected images (the 'read' tool will place the JPEG into the LLM context).
Latest Release
v1.0.0
Initial release: video frame extraction for multimodal LLM context injection
Popular Skills
Published by @john-ver on ClawHub