Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Security Analysis
medium confidenceThe skill's instructions expect local transcription and TTS binaries and scripts, but the declared manifest lists no required binaries, installs, or credentials — the pieces don't line up and need clarification before use.
The description (voice-to-voice on WhatsApp) is plausible, but the manifest declares no required binaries, no install steps, and no WhatsApp integration credentials or endpoints. The SKILL.md explicitly requires local tools (ffmpeg, whisper-cpp, sherpa-onnx-tts) and scripts (tools/transcribe_voice.sh, bin/sherpa-onnx-tts) which are not declared in the registry metadata. That mismatch is disproportionate to the claimed purpose and means the skill may fail or assume access it hasn't requested.
The instructions tell the agent to run local scripts and binaries and to send audio via a `message` tool, but they do not explain how incoming audio is surfaced to the agent, where the scripts come from, or what the `message` tool's required parameters/permissions are. The SKILL.md restricts use to 'local tools only' (no cloud) and asks the agent to always return both text and audio — no steps ask to read unrelated files or environment variables, but the instructions assume filesystem and binary access that aren't guaranteed.
There is no install spec (instruction-only), which lowers install risk. However, the skill depends on external binaries and scripts that would need to be present on the host. The lack of an install mechanism or references to known release sources means the agent or operator must manually install/verify those dependencies; that operational gap is noteworthy but not inherently malicious.
The skill declares no environment variables or credentials, which is consistent with its claim to use local-only tools. However, because it targets WhatsApp conversations, the absence of any declared messaging/WhatsApp credential or integration details is suspicious — the skill assumes the agent has access to a messaging tool capable of sending files but doesn't declare what access is required.
The skill does not request always:true and uses default invocation settings. It does not attempt to modify system-wide settings in the provided instructions. No persistence or elevated platform privileges are requested in the manifest.
Guidance
Before installing or enabling this skill, verify the following: (1) Confirm which binaries and scripts it requires (ffmpeg, whisper-cpp, sherpa-onnx-tts, tools/transcribe_voice.sh, bin/sherpa-onnx-tts) and install them from trusted sources — the manifest currently lists none. (2) Ensure your agent actually has a 'message' tool and WhatsApp integration set up and understand what credentials or API access that requires; the skill does not declare any credentials. (3) Ask the publisher to update the manifest to list required binaries, install instructions, and any needed credentials. (4) Consider running the skill in a sandbox or test account first — audio processing can involve sensitive content, and the skill assumes local filesystem access which could fail or be abused. (5) Note the performance constraint (RTF < 0.5) may be unrealistic for local models and could lead to degraded behavior; confirm resource needs. If the publisher cannot clarify these gaps, treat the skill as untrusted.
Latest Release
v1.0.0
- Updated skill.
More by @rubenfb23
Published by @rubenfb23 on ClawHub