Question 1

Is this tool really free?

Accepted Answer

Yes &mdash; completely free with no time limits, no file size caps, no watermarks, and no signup. Most browser-based transcription services either cap free use at a few minutes per day or upload audio to their servers. This tool does neither. The site is supported by ads and affiliate links elsewhere; the tools themselves are unrestricted.

Question 2

Is my audio uploaded to any server?

Accepted Answer

No. The entire transcription runs in your browser using OpenAI's Whisper model (MIT-licensed, open source) loaded via Transformers.js. The model file downloads once and caches locally. After the first load, the tool works offline.

Question 3

Why does it download a model the first time?

Accepted Answer

Whisper is a neural network &mdash; the model file contains the trained "knowledge" needed to convert speech to text. It downloads once (40MB for tiny, up to 250MB for small) and caches in your browser. All subsequent transcriptions run from the cached copy without any network activity.

Question 4

What languages are supported?

Accepted Answer

99 languages with the multilingual models &mdash; including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, and many more. For English-only audio, the English-tuned variant is faster and slightly more accurate.

Question 5

How accurate is browser-based Whisper?

Accepted Answer

The tiny model gets ~90% accuracy on clear English speech and is fast. The base model gets ~94% and is the recommended balance. The small model gets ~96% but is slower. Whisper handles accents, background noise, and technical vocabulary noticeably better than older speech-to-text systems.

Question 6

Does it handle accents and background noise?

Accepted Answer

Whisper was trained on 680,000 hours of diverse audio and handles accents, dialects, and moderate background noise well. The base or small models are recommended over tiny for non-native speakers, heavy accents, or noisy recordings.

Question 7

What audio formats are supported?

Accepted Answer

MP3, WAV, M4A, OGG, FLAC, WebM, and any format your browser's Web Audio API can decode. Video files like MP4 and MOV also work &mdash; the audio track is extracted automatically. Maximum recommended file size is around 200MB.

Question 8

What's the maximum audio length?

Accepted Answer

There's no hard limit. Transcription happens in 30-second chunks, so a 2-hour audio file processes the same way as a 2-minute one &mdash; just longer. Files over ~200MB may strain browser memory on older devices.

Question 9

Can I get timestamps for subtitles?

Accepted Answer

Yes. The tool produces segment-level timestamps and exports to SRT and VTT formats, ready for YouTube, Premiere, Final Cut, DaVinci Resolve, or any subtitle editor. Word-level timestamps are also available as an option.

Question 10

Can it separate speakers (diarization)?

Accepted Answer

Not yet &mdash; Whisper transcribes speech but doesn't label "Speaker 1" vs "Speaker 2." For meetings or interviews where speaker labels matter, consider Otter.ai or a Whisper + pyannote-audio pipeline (advanced, requires Python).

Question 11

Can I edit the transcript?

Accepted Answer

Yes &mdash; click any segment to edit the text inline. Edits persist in the export. For heavier editing, copy the plain text into your preferred editor.

Question 12

How long does transcription take?

Accepted Answer

On a modern laptop with the tiny model, transcription runs at roughly 5&ndash;10x real-time speed &mdash; a 10-minute recording takes about 1&ndash;2 minutes. The base and small models are slower but more accurate. On WebGPU-capable browsers (Chrome, Edge) the speed-up is significant.

Question 13

Do I need a fast computer?

Accepted Answer

Any device with 4GB+ RAM and a modern browser will work. WebGPU acceleration (Chrome, Edge) makes it significantly faster but is not required. The tool runs on phones and tablets, though desktops handle large files more comfortably.

Question 14

Does it work on phones?

Accepted Answer

Yes, on modern phones (2022+). Performance is roughly 2&ndash;3x slower than a laptop. Tiny model recommended on mobile. iOS Safari occasionally has memory issues with large audio files &mdash; Chrome on Android typically handles them better.

Question 15

What's WebGPU and do I need it?

Accepted Answer

WebGPU is a newer browser API that lets web apps use your GPU for ML workloads. When available (Chrome 113+, Edge 113+), transcription runs 3&ndash;10x faster. The tool falls back to WebAssembly automatically if WebGPU is unavailable, so it works everywhere &mdash; just slower without it.

Free Audio Transcription

Related Tools on UDT

Why Transcribe in Your Browser?

Choosing the Right Model

Common Use Cases

How We Compare

Frequently Asked Questions