yapsnap
Snap any video URL or audio file into plaintext. No GPU. No cloud. One command.
yapsnap " https://www.youtube.com/watch?v=dQw4w9WgXcQ "
That's it. You get a .txt next to your shell, transcribed on your CPU, in less time than it took the video to play.
Why yapsnap
⥠Fast on CPU. Streaming Zipformer transducer (Kroko English) chews through audio at several times realtime on a laptop. No CUDA. No M-series-only tricks. Plain old cores.
Streaming Zipformer transducer (Kroko English) chews through audio at several times realtime on a laptop. No CUDA. No M-series-only tricks. Plain old cores. đ Any video URL, plus local files. YouTube. X. TikTok. Instagram Reels. Direct .mp4 / .mp3 links. Or just point it at a file on disk. yt-dlp handles the fetch, ffmpeg handles the decode, the rest is yours.
YouTube. X. TikTok. Instagram Reels. Direct / links. Or just point it at a file on disk. yt-dlp handles the fetch, ffmpeg handles the decode, the rest is yours. đ´ Offline after first run. ~80 MB model downloads once to your cache and stays there. No API keys. No quotas. Your audio never leaves your machine.
~80 MB model downloads once to your cache and stays there. No API keys. No quotas. Your audio never leaves your machine. đĒļ One file, three deps. sherpa-onnx , numpy , yt-dlp . The whole tool is a single Python module.
, , . The whole tool is a single Python module. âą Sentence-level timestamps when you want them. --timestamps adds [MM:SS] per sentence using Kroko's built-in punctuation. Timing stays correct even when you transcribe at 2x.
... continue reading