Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Security Analysis
medium confidenceThe skill's stated purpose (local WhatsApp voice transcribe + TTS) is plausible, but the runtime instructions reference local scripts and specific binaries that are not declared or included — an inconsistency that warrants caution.
The description (voice-to-voice on WhatsApp using local transcription and TTS) matches the actions described in SKILL.md. However, the skill metadata declares no required binaries or files while the instructions explicitly reference tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, whisper-cpp, and sherpa-onnx-tts. Those are necessary for the stated purpose but are neither included nor declared, which is an inconsistency.
Instructions tell the agent to execute local scripts/binaries and to send .ogg files via the message tool. They do not ask for extra env vars or unrelated files, but they require executing code at host paths (tools/transcribe_voice.sh, bin/sherpa-onnx-tts). Because those files are not provided, the skill will rely on whatever binaries exist on the host—this gives the agent power to run arbitrary local code if those paths are populated.
This is instruction-only (no install spec or code). That reduces the risk of the skill dropping arbitrary code during installation. However, the runtime depends on externally installed local binaries which the user must provide.
The skill requests no environment variables or credentials, which is proportionate to its described local-only operation. There is no unexplained request for unrelated secrets. Be aware that sending messages via the agent's messaging integration still requires whatever platform credentials the agent normally uses, but those are not requested by this skill.
The skill is not always-enabled and does not request elevated platform privileges or attempt to modify other skills. Autonomous invocation is allowed (platform default), which is normal and not by itself a red flag.
Guidance
This skill's behavior is plausible but inconsistent: SKILL.md requires local scripts/binaries (tools/transcribe_voice.sh, bin/sherpa-onnx-tts, ffmpeg, whisper-cpp) that are neither included nor declared. Before installing or enabling it, verify: 1) where those binaries/scripts will come from and that they are from trusted sources; 2) the exact content of tools/transcribe_voice.sh (so it doesn't run unexpected commands); 3) that you are comfortable the agent can execute local binaries on the host. If you can't audit or control the referenced scripts/binaries, consider not installing or asking the author for a clear dependency list and safe installation instructions. Providing the missing files or explicit dependency declarations (and ideally checksums or official sources) would reduce the concern and could change the assessment to benign.
Latest Release
v1.0.0
Walkie-Talkie skill initial release: - Enables automatic voice-to-voice conversations on WhatsApp via local transcription and text-to-speech. - Transcribes incoming audio messages and processes them as prompts. - Responds with both text and synthetic voice audio using local TTS. - Activates when receiving audio messages or upon user request ("activa modo walkie-talkie", "hablemos por voz"). - Uses only local tools for processing and aims to ensure rapid response times.
More by @rubenfb23
Published by @rubenfb23 on ClawHub