Claude-real-video － any LLM can watch a video

Let Claude — or any LLM — actually watch a video.

Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it reads the transcript, not the picture. Claude won't take a video file at all. Even Gemini, which can read video natively, has to send it up to Google and samples frames at a fixed interval (1 fps by default), so fast cuts slip past.

claude-real-video does it differently, and locally: point it at a URL or a file, and it pulls the frames that actually matter (every scene change, not a fixed quota), throws away the near-duplicates, transcribes the audio, and hands you a clean folder any LLM can read — on your own machine, nothing uploaded.

crv " https://www.youtube.com/watch?v=... " # → crv-out/frames/*.jpg + crv-out/transcript.txt + crv-out/MANIFEST.txt

Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.

Why not just sample frames?

Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames at a fixed interval — e.g. one per second. That over-samples a static screencast and under-samples a fast-cut reel. claude-real-video is smarter:

fixed-interval sampling claude-real-video Frame selection every N seconds scene-change detection + density floor Repeated shots (A-B-A cuts) sent again every time sliding-window dedup sends each shot once Static slide (10 min) ~600 near-identical frames collapses to 1 (dedup) Fast-cut reel misses frames between samples catches each visual change Audio often ignored Whisper transcript w/ language detect Where the video goes often uploaded to a cloud stays on your machine Input usually local file only URL (yt-dlp) or local file

You feed the model fewer, more meaningful frames — cheaper context, better understanding.

Install

... continue reading