Pro Audio Editing in Your Browser: When You Actually Need a DAW

By Derek Giordano May 14, 2026 ~10 min read Audio

FFmpeg.wasm matured. libsoxr matured. The pitch/tempo algorithms that used to require a DAW or a paid plugin chain are now built into the standard FFmpeg distribution that runs in any modern browser. The 13-tool UDT Audio Suite that shipped today covers what used to require Audacity for most podcast and music post-production routine work. Here's what changed, what still needs a DAW, and a quick read on where in-browser audio editing fits in the 2026 workflow.

What changed: three things, all underneath the surface

The shift from "audio editing requires Audacity" to "audio editing fits in a browser tab" came from three pieces of infrastructure maturing roughly together, none of which got much attention as it happened.

FFmpeg.wasm at production quality. FFmpeg has been the standard audio engine for two decades. The WebAssembly build, FFmpeg.wasm, started as a curiosity in 2019 and quietly became production-grade by 2024. The current v0.12 series ships almost every FFmpeg filter that matters — afade, atempo, asetrate, alimiter, silenceremove, aresample with the SoX resampler backend, all working in the browser at roughly 30-60% of native CPU speed. For audio (which is small compared to video), that's fast enough to feel instant on most files.

libsoxr resampling baked in. The Sound eXchange resampler — the gold-standard open-source resampling library — got bundled into the default @ffmpeg/core wasm distribution. Before this, in-browser sample rate conversion produced audible aliasing on extreme conversions. Now it's transparent. 44.1k to 48k for film delivery, 48k to 16k for Whisper input, any common conversion — clean enough that blind tests fail.

Pitch-preserved time stretching. The trick — asetrate=BASE*2^(N/12),aresample=BASE,atempo=2^(-N/12) — has been documented forever, but it required the right combination of filters to be present and the right precision in the resampler to sound clean. As of 2024 both conditions are reliably met. The result is that pitch shifting up to ±12 semitones, and tempo changes from 50% to 200%, run in the browser at quality close to a desktop DAW's standard algorithm. Not transparent enough for high-end music mastering, but more than good enough for podcast voice correction or music practice tracks.

The 13-tool chain that replaces Audacity for routine work

The Audio Suite at /tools/category/audio/ now has 13 tools covering the full post-production pipeline:

Compress & Convert: Compressor (MP3/AAC/Opus presets) · Converter (all-format bidirectional) · Sample Rate Converter (explicit SR with SoX-quality resampling)

Trim & Edit: Trimmer (waveform scrub) · Silence Remover (auto-detection) · Merger (concatenate + crossfade) · Fade In/Out (4 curve shapes)

Pitch & Tempo: Pitch Shifter (±12 semitones) · Tempo Changer (50-200%)

Master & Normalize: Normalizer (EBU R128, all streaming targets) · Peak Limiter (brickwall)

Channel Routing: Channel Tool (split/merge/downmix)

Extract from Video: Extractor

For a routine podcast workflow — record three takes, edit, polish, ship — the chain is: Trim each take, Silence Remove to tighten gaps, Merge with a 250ms voice-to-voice crossfade, Fade the bookends, Normalize to Spotify's -14 LUFS or Apple's -16 LUFS preset, Limit at -1dB ceiling, Compress to 128kbps MP3 for distribution. Seven steps, seven different tools, one engine. The FFmpeg.wasm core downloads once (about 32MB) and stays cached — every operation after the first starts instantly.

The rubberband question: why asetrate + atempo, not rubberband itself

If you've used a DAW for pitch correction, you've probably encountered the Rubber Band library — the open-source phase vocoder used by Audacity 3.0+, Reaper, Ardour, and many others. It's the high-quality choice for pitch-preserved time stretching, particularly for vocal work where preserving formants matters.

Rubber Band isn't in the standard @ffmpeg/core wasm build — only certain custom builds include the rubberband filter. Including it would mean shipping a non-standard FFmpeg core, which would mean every user downloads a custom build (no shared cache with the Video Suite, no benefit from the jsDelivr CDN edge caching). For the Audio Suite to keep the "one 32MB engine for everything" architecture, we use the standard asetrate + aresample + atempo chain instead.

The cost: pitch shifts beyond ±6 semitones color the audio more than Rubber Band would, and tempo changes outside 75-150% can introduce subtle phasing on transient sounds. For podcast voice work and music practice, the difference is rarely audible. For broadcast-quality vocal pitch correction or fine music mastering, a DAW with Rubber Band (or Melodyne, or Auto-Tune) is still the right tool.

EBU R128: the unsexy step that matters most

Loudness normalization is the unglamorous part of audio post-production that has the biggest impact on listener experience. Spotify normalizes everything to -14 LUFS integrated loudness. Apple Music sits at -16 LUFS. Podcasts target -16 to -19. YouTube does its own thing around -14. If you ship audio at -10 LUFS (loud), the platform turns it down; you've used dynamic range you didn't need. If you ship at -22 LUFS (quiet), the platform turns it up; you've wasted headroom and the limiter on the platform's side adds its own coloration.

EBU R128 is the broadcast standard that codified this. The loudnorm filter in FFmpeg implements a two-pass version of R128 normalization — measure the integrated loudness in pass one, apply the corrective gain in pass two — and produces output that matches any target LUFS precisely. The Audio Normalizer exposes this with platform presets so you can hit -14, -16, -18, or -23 LUFS without thinking about the underlying math.

The catch: aggressive normalization can push peaks past 0dB and clip. That's why limit-after-normalize is the standard mastering chain. The Audio Peak Limiter at -1dB ceiling catches anything the normalization pushed near 0dB. Together they replace the "loudness maximization" plugin chains that DAW users assemble for mastering.

How we compare: the honest table

UDT Audio SuiteFree, browser-based, 13 tools sharing FFmpeg.wasm + libsoxr. No upload, no signup, no watermark. Quality good enough for 95% of podcast and music post-production. Limitations: no multi-track mixing, no plugin hosting, no formant-preserving pitch shift.

AudacityFree desktop, GPL. Mature multi-track editor with full plugin support (LADSPA, VST, AU). The full DAW alternative to UDT for anyone who needs sessions, plugins, or unlimited tracks. UDT replaces the routine single-file edits Audacity overserves.

Adobe Audition$20-$60/month (Creative Cloud). Industry-standard for podcast post-production. Spectral repair, multitrack sessions, broadcast-grade limiters and noise reduction. The right tool when client deliverables require it.

Reaper$60 personal / $225 commercial. Cheap pro DAW. Lighter than Audition, full plugin host, full multi-track. The DIY podcaster's pro choice when budget matters more than the Adobe brand.

Descript$15-30/month subscription. Cloud-based, transcription-first editor. Excellent for "edit the audio by editing the transcript." Different paradigm than UDT — Descript optimizes for scripted talk-show production, UDT for general audio file editing.

Auphonic$11+/month for 2+ hours. Cloud-based automated podcast mastering. Excellent at what it does (single-pass clean-up + loudness). UDT replaces Auphonic's free tier and exposes the underlying steps as separate tools for cases where you want more control.

When you actually need a DAW

The honest list of cases where in-browser editing isn't enough:

1. Multi-track sessions

If you're producing music — multiple instrument tracks routed through individual effect chains and bussed to a master — you need a DAW. UDT operates on one file at a time. Recording a band, mixing eight microphones from a podcast roundtable, scoring video to picture — DAW territory.

2. Plugin hosting (VST, AU, LADSPA)

Browser-based audio can't host third-party plugins. If your workflow requires FabFilter Pro-Q, Waves bundles, iZotope RX for advanced repair, or any of the studio-standard plugin chains, you need a desktop DAW. UDT exposes the filters built into FFmpeg; that's a deep set but not equivalent to the third-party plugin ecosystem.

3. Large-scale podcast post-production (>20 episodes/week)

At high volume the per-file friction of dragging files into a browser tool one at a time adds up. Desktop tools with batch processing (Audition's Match Loudness panel, Reaper's batch-render) handle 50-episode-per-week shops more efficiently. UDT's chain becomes painful past about 20 files/day.

4. MIDI and software instruments

Any task involving MIDI sequencing, virtual instruments, or sample triggering is DAW-only. UDT works with rendered audio files; it has no concept of MIDI or note data.

5. Spectral editing

Removing a single cough from the middle of a podcast take by literally drawing on the frequency spectrum is a feature Audition and iZotope RX excel at. FFmpeg has no spectral editor. UDT can't do this; if you need it, the tool is RX or Audition.

6. High-end vocal pitch correction

Auto-Tune, Melodyne, and similar formant-aware pitch correction tools produce results UDT's asetrate-atempo chain cannot match. For commercial music vocal work, those are still the right tools.

Three heuristics for picking the side

Heuristic 1: How many files? One to five files: UDT every time. Five to twenty: UDT works but watch the per-file friction. Twenty-plus: desktop DAW with batch processing.

Heuristic 2: How many simultaneous tracks? One at a time: UDT. Multiple tracks routed together: DAW.

Heuristic 3: Are you using plugins? No: UDT or DAW, your call. Yes: DAW.