Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Security Analysis
medium confidenceThe skill's purpose (local WhatsApp voice-to-voice) matches its instructions, but it relies on unspecified local scripts/binaries and makes assumptions about the runtime that are not declared — verify those components before use.
Name and description align with the SKILL.md: the instructions describe local transcription and local TTS for WhatsApp voice messages. However the skill references local artifacts (tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, whisper-cpp) yet the registry metadata lists no required binaries/install steps and no code files are supplied. It also assumes the agent has a 'message' tool capable of sending WhatsApp voice notes; that capability is not documented here. The mismatch between declared requirements (none) and actual referenced tools is an incoherence.
SKILL.md gives concrete runtime steps (transcribe incoming audio with a local script, produce TTS with a local binary, send .ogg via message tool). It does not instruct reading unrelated system files or exfiltrating data. The concern is that the instructions direct execution of unspecified local scripts/binaries — those could run arbitrary code. The RTF < 0.5s constraint is unrealistic and may lead to aggressive behavior or retries.
There is no install spec (instruction-only), which minimizes direct install risk. That said, the skill requires local tools to be present; since none are provided or listed, the runtime will depend on whatever binaries/scripts exist on the host.
The skill declares no required environment variables or credentials. This is proportionate to an instruction-only local TTS/transcription flow. Caveat: WhatsApp integration implies the agent/runtime already has messaging credentials or tools; those are external to this skill and not declared here.
always:false and default invocation rules are set. The skill does not request persistent installation or elevated platform privileges in its metadata.
Guidance
This skill is instruction-only and expects local transcription and TTS tools that are not included or declared. Before installing or enabling it: 1) Verify the existence and provenance of tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, and whisper-cpp on the host; inspect the contents of any referenced scripts to ensure they don't execute unexpected commands. 2) Confirm your agent's 'message' tool legitimately has WhatsApp access and that any WhatsApp tokens live in a secure place you control (the skill does not request or document credentials). 3) Run the skill in a sandbox or isolated environment first to observe behavior (it executes local binaries and writes/reads /tmp/* files). 4) Beware the strict RTF requirement — it may cause retries or other aggressive actions. If you cannot review or vet the referenced binaries/scripts, treat this skill as untrusted and avoid enabling it.
Latest Release
v1.0.0
Initial version: voice-to-voice on WhatsApp with local TTS/ASR
More by @rubenfb23
Published by @rubenfb23 on ClawHub