NEW · Whisper · FFmpeg.wasm · 100% Browser-Based

Video Auto-Subtitle

Auto-transcribe and burn subtitles into video — OpenAI Whisper runs locally for speech-to-text, FFmpeg.wasm burns the SRT into the video. 99 languages. No upload, no signup, no watermark.

🎬
Drop a video file
MP4 · MOV · WebM · MKV · AVI · GIF
Files stay on your device · Never uploaded

Related Tools on UDT

Audio Transcription →
Whisper-based AI speech-to-text — 99 languages, in your browser.
Video Trimmer →
Trim videos with frame-accurate precision in your browser.
Video Watermark →
Add text or image watermarks — position, opacity, size, font controls.
All Video Tools →
Browse the full Video Suite — 12+ tools, all in-browser.

Why Do This in Your Browser?

Cloud-based subtitle generators are everywhere — Rev's auto-captions, Veed, Kapwing, Descript, Submagic — and they all share two costs: usage limits or per-minute fees, and the upload step. The actual transcription engine that most of these tools wrap is OpenAI's Whisper, which is MIT-licensed and runs anywhere with enough memory to load it.

Whisper's tiny and base models (39MB and 74MB respectively) are small enough to run in a browser. The pipeline: extract the audio track from your video with FFmpeg.wasm, run Whisper.cpp via Transformers.js to produce timestamped segments, format them as SRT, then burn the SRT back into the video with FFmpeg's subtitles= filter. End-to-end, entirely in your browser.

How It Works

The tool runs three stages. First, FFmpeg.wasm extracts the audio track at 16kHz mono WAV (Whisper's expected input). Second, Transformers.js loads the chosen Whisper model (tiny / base / small) and runs ASR on the audio, producing word-level timestamps in 99 supported languages. Third, the timestamps are formatted as a SRT file and passed back to FFmpeg with `subtitles=subs.srt` to burn the captions into the video at the chosen style.

Subtitle styling supports font size, color, outline, background opacity, vertical position, and the burned-in vs. soft-subtitle export choice. Soft subtitles (passenger SRT in MKV) keep the captions toggleable; burned-in subtitles guarantee they appear on every platform regardless of player support.

Tip: If you only need a transcript without burning subtitles into the video, the standalone Audio Transcription tool exports plain TXT / SRT / VTT / JSON. If you need to chop the clip first, use the Video Trimmer before running auto-subtitle to save processing time.

Common Use Cases

Social Media Auto-Captions
85% of mobile video is watched muted. Burned-in captions are not optional for TikTok, Reels, Shorts, or LinkedIn.
Course & Lecture Subtitles
Auto-caption a course recording for accessibility, then download the SRT for manual cleanup before publishing.
Foreign-Language Content
Whisper handles 99 languages. Subtitle Mandarin, Spanish, Arabic, Hindi videos without a human transcriber.
Podcast Video Clips
Repurpose podcast audio + video as captioned clips for social — Whisper transcription accuracy is competitive with paid services on clean audio.
Internal Training Videos
Caption recordings before posting to a wiki or LMS; saves the manual transcription pass for HR or compliance.
Translated Captions
Translate-mode in Whisper outputs English subtitles for non-English source audio — useful for international content reach.

How We Compare

Honest read on free, paid, and self-hosted options for this kind of job:

UDT Video Auto-Subtitle (this tool): Free, browser-based, OpenAI Whisper runs locally via Transformers.js. 99 languages, customizable styling, burn-in or soft-subtitle export. No upload, no watermark, no signup. Limitation: tiny/base/small models only — Whisper large (1.5GB) is too big for the browser; for top-tier accuracy on professional content, a desktop Whisper install is faster.
Rev.com — Auto Captions: Excellent accuracy on clean audio; $0.25/minute pay-as-you-go. Cloud-based; videos upload to Rev's servers.
Descript — Auto Captions: Bundled with the Descript editor ($12–$30/mo). Caps on free tier; cloud-based.
Veed.io — Auto Subtitles: Free tier limited to 25 minutes/month and watermarks output; $12+/mo to remove.
Kapwing — Auto Subtitles: Free tier limited to 4 minutes and watermarks output; $16/mo paid plans.
Submagic / Captions.ai: Designed for animated 'pop' captions on social-first content. Cloud-based; $24+/mo. This tool produces standard SRT captions; for animated word-by-word styles, Submagic is the right choice.

Frequently Asked Questions

Is auto-subtitle really free?+
Yes — completely free, no watermark, no time limits, no signup. The site is supported by ads elsewhere; the tool is unrestricted.
How accurate is the transcription?+
On clean, English audio at conversation pace, the base model produces 90–95% word accuracy. Heavy accents, background noise, multiple speakers talking over each other, and technical jargon reduce accuracy — typical of every ASR system. The small model is more accurate but slower; pick based on your tolerance for processing time.
Are my videos uploaded anywhere?+
No. Whisper runs in your browser via Transformers.js, and FFmpeg.wasm handles encoding locally. The Whisper model (~74MB for base, ~244MB for small) and FFmpeg engine (~32MB) download once and cache. After that, the tool works fully offline.
Which languages are supported?+
All 99 languages Whisper supports: English, Spanish, Mandarin, French, German, Japanese, Korean, Arabic, Hindi, Portuguese, Russian, Italian, Dutch, Polish, Turkish, Vietnamese, Indonesian, and 82 more. Translate-to-English mode is also available for non-English sources.
Can I edit the captions before burning them in?+
Yes. After transcription, the tool shows an editable timeline view of every segment — you can correct misheard words, merge or split segments, and adjust timestamps. Click 'Burn subtitles' once edits are complete.
Burn-in or soft subtitles — which should I pick?+
Burn-in (the default) is universal — every player, every platform, every device will show the captions. Use this for social uploads. Soft subtitles (SRT or MKV with passenger track) let viewers toggle them on/off and keep the original video unmodified. Use this for archives, broadcast workflows, or sites that handle captions client-side (YouTube, Vimeo).
How long does processing take?+
Whisper runs at roughly real-time on the tiny model, 0.7x real-time on base, and 0.3x real-time on small (on a recent laptop with WebGPU). A 5-minute video with the base model takes about 7 minutes end-to-end including FFmpeg burn-in. Tiny is faster; small is more accurate.
What's the underlying engine and license?+
OpenAI Whisper (MIT licensed) via Hugging Face Transformers.js (Apache 2.0); FFmpeg.wasm (MIT wrapper, LGPL core) for audio extraction and burn-in. Models are hosted on Hugging Face Hub. The tool's UI is the only proprietary layer.