Why Do This in Your Browser?
Cloud-based subtitle generators are everywhere — Rev's auto-captions, Veed, Kapwing, Descript, Submagic — and they all share two costs: usage limits or per-minute fees, and the upload step. The actual transcription engine that most of these tools wrap is OpenAI's Whisper, which is MIT-licensed and runs anywhere with enough memory to load it.
Whisper's tiny and base models (39MB and 74MB respectively) are small enough to run in a browser. The pipeline: extract the audio track from your video with FFmpeg.wasm, run Whisper.cpp via Transformers.js to produce timestamped segments, format them as SRT, then burn the SRT back into the video with FFmpeg's subtitles= filter. End-to-end, entirely in your browser.
How It Works
The tool runs three stages. First, FFmpeg.wasm extracts the audio track at 16kHz mono WAV (Whisper's expected input). Second, Transformers.js loads the chosen Whisper model (tiny / base / small) and runs ASR on the audio, producing word-level timestamps in 99 supported languages. Third, the timestamps are formatted as a SRT file and passed back to FFmpeg with `subtitles=subs.srt` to burn the captions into the video at the chosen style.
Subtitle styling supports font size, color, outline, background opacity, vertical position, and the burned-in vs. soft-subtitle export choice. Soft subtitles (passenger SRT in MKV) keep the captions toggleable; burned-in subtitles guarantee they appear on every platform regardless of player support.
Tip: If you only need a transcript without burning subtitles into the video, the standalone Audio Transcription tool exports plain TXT / SRT / VTT / JSON. If you need to chop the clip first, use the Video Trimmer before running auto-subtitle to save processing time.
Common Use Cases
How We Compare
Honest read on free, paid, and self-hosted options for this kind of job: