NEW · MediaPipe · 100% Browser-Based

Video Face Auto-Crop

Smart-crop landscape videos to vertical with face tracking — MediaPipe Face Detection keeps the subject framed as they move. 9:16 / 1:1 / 4:5 / custom. Browser-based, no upload, no signup.

🎬
Drop a video file
MP4 · MOV · WebM · MKV · AVI · GIF
Files stay on your device · Never uploaded

Related Tools on UDT

Video Background Remover →
Remove or replace video backgrounds with MediaPipe Selfie Segmentation — color, image, or transparent WebM.
Video Resizer →
Resize to 9:16 vertical, 1:1 square, or any aspect ratio with smart crop.
Video Trimmer →
Trim videos with frame-accurate precision in your browser.
All Video Tools →
Browse the full Video Suite — 12+ tools, all in-browser.

Why Do This in Your Browser?

Auto-reframing has become a paid feature on every major editor — Premiere's Auto Reframe, Descript's Reframe, Vimeo's smart crop, Veed's reframe. They all do roughly the same thing: detect the most important content in each frame and pan the crop window to follow it. The detection logic is small; the friction is paying for it and uploading.

MediaPipe Face Detection runs in 30–60ms per frame on a recent laptop and produces a per-frame face bounding box you can drive a crop window from. Combined with FFmpeg.wasm for the final re-encode, an entire auto-reframe runs in your browser at roughly 2–3x real-time.

How It Works

Drop a landscape video. The tool decodes frames via a HTMLVideoElement, runs MediaPipe Face Detection on each, and records the face centroid over time. Then it generates a smoothed crop path — a Kalman-filtered pan that follows the face without jittering between frames where detection wobbles. Final pass re-encodes at the target aspect ratio with FFmpeg.wasm.

If multiple faces are detected per frame, the tool defaults to the largest face (assumed to be the foreground subject). You can also lock the crop to a specific face by clicking it in the preview, or fall back to motion-based smart crop for clips without faces. For frames where no face is detected, the crop holds at its last known position.

Tip: The Video Resizer's smart-crop is motion-based and works on any content. This tool is face-specific and produces better results on talking-head clips. For the inverse workflow — reframing a clip without a face — use the Video Resizer. Combine with the Video Watermark tool to add captions to the reframed vertical clip.

Common Use Cases

Landscape Interview → TikTok
Recorded in 16:9 widescreen, post in 9:16 vertical — the face stays centered as the speaker moves.
Webinar Clips for Shorts
Cut a 60-second highlight from a Zoom recording, then auto-reframe to 9:16 for YouTube Shorts and Reels.
Podcast Video Reels
Multi-camera podcast cuts often track between hosts. Lock the crop to one host's face for a single-speaker reel.
Course Lesson Excerpts
Pull a single talking moment from a long course video and post it as a vertical promo clip.
Sales Demo Highlights
Extract a 30-second product walk-through from a wider sales recording, reframed for mobile preview.
Event Speaker Recap
Conference talk → vertical highlight reel without a manual keyframed pan.

How We Compare

Honest read on free, paid, and self-hosted options for this kind of job:

UDT Video Face Auto-Crop (this tool): Free, browser-based, MediaPipe Face Detection runs locally. Kalman-smoothed crop path, multi-face support with manual lock, motion-fallback for no-face content. No upload, no watermark, no signup.
Adobe Premiere Pro — Auto Reframe: The gold standard for auto-reframing. Requires Creative Cloud ($20+/mo) and renders on your local machine. Higher quality but not free, not browser-based.
Descript — Reframe: Built into Descript's editor; $12–$30/mo. Cloud-based, with usage limits on free tier.
Veed.io — Smart Crop: Web-based but cloud. Free tier watermarks the output; $12+/mo to remove.
Kapwing — Smart Cut: Cloud-based reframe; 4-minute free cap and a watermark; $16/mo paid.
OpenShot / Shotcut (desktop): Free open-source editors, but auto-reframe is not built in — you manually keyframe the crop window. Hours of work for what this tool does in seconds.

Frequently Asked Questions

Is this auto-crop really free?+
Yes — completely free, no watermark, no time limits, no signup. The site is supported by ads elsewhere; the tool is unrestricted.
How does face tracking work?+
MediaPipe Face Detection (BlazeFace) is a small (~200KB) on-device model that returns face bounding boxes per frame. The tool runs detection on every frame, collects centroids, then applies a Kalman filter to produce a smooth crop path. This eliminates the jitter that frame-by-frame detection would otherwise introduce.
Are my videos uploaded anywhere?+
No. MediaPipe runs in your browser; FFmpeg.wasm re-encodes locally. The MediaPipe model (~200KB) and the FFmpeg engine (~32MB) download once and cache. After that, the tool works fully offline.
What if there are multiple people in the frame?+
By default, the tool tracks the largest face (assumed foreground subject). You can override this by clicking the face you want to lock onto in the preview pane — the tool will follow that subject through the rest of the clip and switch only if it disappears for more than 30 frames.
What happens on frames without a face?+
The crop holds at its last known position for up to 60 frames (~2 seconds at 30fps). Beyond that, it falls back to center-crop. If your clip has long no-face segments, the motion-based Video Resizer is a better fit.
Which output aspect ratios are supported?+
9:16 (TikTok / Reels / Shorts), 1:1 (square Instagram), 4:5 (Instagram portrait feed), 21:9 (cinematic), and custom ratios. The output preserves your source video's height; output width is computed from the chosen aspect ratio.
How long does processing take?+
Roughly 2–3x real-time on a recent laptop with WebGPU enabled — a 60-second source clip takes about 25–40 seconds end-to-end. Phones run 4–6x slower. Long clips (5+ minutes) are workable; very long ones (20+ minutes) push browser memory.
What's the underlying engine and license?+
MediaPipe Tasks Vision (Apache 2.0) for face detection; FFmpeg.wasm (MIT wrapper, LGPL core) for re-encoding. Both serve from public CDNs unmodified.