Why Transcribe in Your Browser?
Most audio transcription services upload your audio to a remote server, process it there, and send back text. For a podcast interview or YouTube voiceover, that's fine. For a confidential client call, a deposition, a therapy session, a medical recording, or a private journal entry, it's a real privacy problem — you're handing sensitive audio to an unknown third party.
This tool is different. OpenAI released Whisper — the same speech recognition model that powers many premium transcription services — under the MIT license, with the model weights fully open source. Combined with Transformers.js (Hugging Face's library that runs ML models directly in browsers via WebAssembly and WebGPU), the entire pipeline can run on your device.
The first time you transcribe something, the model file downloads (40–250MB depending on which you pick) and caches in your browser. After that, you can disconnect from the internet entirely and the tool still works. Audio never touches a server — not ours, not OpenAI's, not anyone's.
Working from a video? Strip the audio track first with the Audio Extractor — Whisper handles the resulting MP3/WAV/AAC/OGG directly. And if your source video is too big to load comfortably, run it through the Video Compressor first to drop it to a manageable size before extraction. Once you have an SRT or VTT, the Subtitle Studio lets you fix any misheard words, retime drifting captions, and convert between subtitle formats before you publish. If you'd rather skip the manual chain and just get a captioned video back, the Video Auto-Subtitle tool combines this exact Whisper pipeline with FFmpeg.wasm subtitle burn-in.
Choosing the Right Model
Whisper comes in several sizes. Larger models are more accurate but take longer to download and run. Here's the trade-off:
.en suffix variants (whisper-tiny.en, whisper-base.en) are trained on English-only data and run slightly faster with marginally better English accuracy. Use the non-suffixed versions for any other language or mixed-language audio.
Common Use Cases
How We Compare
Honest read on free, paid, and self-hosted options for transcription: