Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Security Analysis
high confidenceThe skill's description matches its behavior, but the runtime instructions require local binaries/scripts and model tooling that the skill metadata does not declare — this mismatch is inconsistent and should be resolved before installing.
The name/description (voice-to-voice WhatsApp) matches the SKILL.md workflow, but the skill metadata declares no required binaries, env vars, or installs while the instructions explicitly require local tooling (ffmpeg, whisper-cpp, sherpa-onnx-tts), a helper script (tools/transcribe_voice.sh), and a local TTS binary (bin/sherpa-onnx-tts). That inconsistency means the skill either omits necessary requirements or assumes access to arbitrary local executables.
Runtime instructions tell the agent to run local scripts/binaries and read/write files (e.g., /tmp/reply.ogg) and to use a 'message' tool to send files. These actions are coherent with the stated purpose, but they reference specific local paths and tools not declared in metadata. This grants the skill broad discretion to execute unspecified local programs and rely on local model artifacts.
There is no install spec (lowest install risk), which is fine for an instruction-only skill — but here it's problematic because the skill expects several local binaries and scripts. Because nothing will be installed by the skill, the operator must supply these dependencies; the missing install/dependency declarations are an integrity/usability risk.
The skill requests no environment variables or credentials (appropriate). However, it implicitly requires access to local filesystem paths and local model binaries; the SKILL.md does not request or document any permissions or configuration for those resources.
The skill does not request always:true and does not declare persistent/system-wide changes. It appears to be user-invocable only and does not request elevated persistent privileges.
Guidance
This skill's behavior (transcribe incoming audio, produce local TTS, send .ogg back) matches its description, but the SKILL.md depends on local tools and scripts that are not declared anywhere. Before installing or enabling: 1) Verify the agent environment actually has the required binaries (ffmpeg, whisper-cpp, sherpa-onnx-tts) and the helper script paths (tools/transcribe_voice.sh, bin/sherpa-onnx-tts). 2) Ask the author to update metadata to list required binaries, exact paths, and any model files or hardware needs. 3) Confirm the 'message' tool used to send files is the authorized platform tool (so audio is sent only to the intended chat) and that no unexpected external endpoints are contacted. 4) Review file permissions around /tmp and any model data to avoid exposing unrelated data. 5) Test in a sandboxed agent first — if the required local tools are missing, the skill will fail or may attempt to run arbitrary local programs if created later. If you cannot verify or supply the declared dependencies, treat this skill as untrusted.
Latest Release
v1.0.0
Initial release of walkie-talkie-mode: enables seamless voice-to-voice conversations on WhatsApp. - Automatically transcribes incoming WhatsApp audio messages using local tools. - Generates voice note replies using local TTS and replies with both audio and text. - Activates when users send audios or command with phrases like "activa modo walkie-talkie". - Prioritizes fast, fully offline processing for privacy and speed. - Includes manual execution instructions for internal use.
More by @rubenfb23
Published by @rubenfb23 on ClawHub