v34: Four Image-Side AI Models in the Browser
v33 brought text AI into the UDT AI Suite with a summarizer, paraphraser, grammar checker, and translator. v34 extends the same browser-AI infrastructure to images: an upscaler, a higher-quality background remover, an image-to-prompt tool, and a depth estimator. All four reuse the existing transformers.js loader at /js/transformers-loader.js, which means no new shared infrastructure shipped — the same WebGPU detection, WASM fallback, and IndexedDB caching that the text tools use. The Suite grows from 7 tools to 11. Each model is Apache 2.0, MIT, or BSD-3-Clause.
The licensing audit was once again the part that took the longest. Same lesson as v33 — a popular browser-AI model often has a license you cannot ship commercially — with a new wrinkle this time: even within a single model family, the license can vary by variant. More on that below.
AI Image Upscaler
AI Image Upscaler ships two Swin2SR variants in a single dropdown. The classical 2x model was trained on bicubic-downsampled clean inputs and is the cleanest pick for line art, logos, vector exports rasterized at low resolution, and UI screenshots. The real-world 4x model was trained with the BSRGAN degradation pipeline, which adds JPEG compression, noise, and blur to inputs at training time — the better pick for phone photos, social-media images, and anything you found on the web. Each variant is about 22 MB quantized. Both are released by CAIDAS at the University of Wuerzburg under the Apache 2.0 license. The Xenova ONNX ports inherit that license, which is the thing to actually check: the ONNX wrapper repo points to the upstream repo (caidas/swin2SR-classical-sr-x2-64) and the license lives there, not on the Xenova page.
The first candidate here was Xenova/2x_APISR_RRDB_GAN_generator-onnx, an anime-style upscaler. Good output on illustrated inputs, but licensed GPL-3.0. UDT's policy is to ship only permissive licenses (Apache, MIT, BSD, ISC, CC0); GPL is excluded regardless of the model-weights-as-software debate, so APISR was out before it was in.
AI Background Remover v2
AI Background Remover v2 uses BiRefNet lite, a bilateral reference network for dichotomous image segmentation from Peng Zheng and collaborators. About 85 MB quantized. MIT licensed. The existing UDT background remover is based on MediaPipe Selfie Segmentation — fast and reliable on portraits, frustrating on anything else. A coffee mug on a table picks up edges along the table seam; a sneaker comes out with a halo where the sole meets the floor; a cat keeps the chair behind it. BiRefNet lite handles general subjects (people, products, pets, plants, vehicles, objects) and produces noticeably cleaner edges on hair, fur, leaves, and translucent fabric. The original MediaPipe tool stays live because for fast portrait batch work it is still faster; v2 is the new default for general use.
The model that gets pitched most loudly for in-browser background removal is BRIA RMBG (versions 1.4 and 2.0). The model card explicitly states the model is released for non-commercial use, and commercial use requires a paid agreement with BRIA. UDT runs ads on the surrounding pages, so the commercial test fails — same trap as v33's NLLB-200. Most public tutorials and demos use RMBG, which is fine for learning but a real problem for anyone actually deploying a service. BiRefNet lite is the equivalent or better model on a permissive license.
AI Image to Prompt
AI Image to Prompt uses BLIP base from Salesforce Research, fine-tuned for image captioning on the COCO dataset. About 280 MB quantized — a vision encoder of about 87 MB and a text decoder of about 194 MB, downloaded together on first use. BSD-3-Clause licensed. The tool ships three output modes. Caption returns BLIP's raw factual description, usually 10-20 words covering the dominant subjects and actions. SD prompt wraps that caption with neutral quality and style tokens (detailed, professional photography, sharp focus, high quality) so it can be pasted directly into Stable Diffusion or Midjourney as a starting prompt. Conditional caption lets you provide a prefix like "a watercolor painting of" and BLIP continues the prompt from there, giving you more control over the framing.
BLIP is not state of the art at this point — BLIP-2 and LLaVA produce longer and more detailed captions — but the larger models are not yet small enough for in-browser use. BLIP base is the right size-quality trade-off today. The model card on the Salesforce repo lists the license as BSD-3-Clause; the Xenova ONNX port mirrors it.
AI Depth Estimator
AI Depth Estimator uses Depth Anything V2 Small. About 27 MB at int8 quantization — the smallest model in the AI Suite by a wide margin, including the v33 text models. Apache 2.0 licensed. The tool produces a grayscale depth map where white is near and black is far, the standard convention used in compositing and 3D software. Drop the depth PNG into an image editor as a mask, into After Effects as a displacement texture, or into ControlNet as a depth guide. Monocular depth estimation produces relative depth (ordering of pixels by distance), not metric depth (distances in meters); for design and compositing work, relative is what you want.
This is where the v34 licensing wrinkle showed up. Depth Anything V2 ships in four sizes — Small, Base, Large, Giant. The Small variant is Apache 2.0. The Base, Large, and Giant variants are CC-BY-NC-4.0 (non-commercial). Same model family, same architecture, same upstream organization (depth-anything on Hugging Face), different licenses depending on the size you pick. The model card for the Base variant on Hugging Face has license: cc-by-nc-4.0 right at the top, but until you click into each variant individually it is easy to assume the family is one license. The lesson — added to UDT's conventions document this release — is to pin the exact model identifier you intend to ship, not the family name, and to confirm the license on that specific model's card before integration.
What Got Cut
Three categories of model got considered for v34 and did not make it in.
NAFNet for image deblurring. NAFNet from megvii-research is genuinely a strong image-restoration model and the upstream repository is MIT licensed. The problem is integration: no transformers.js-compatible processor config exists for NAFNet, which means using it would require writing custom raw ONNX Runtime Web inference outside the shared loader. UDT's rule for v34 was "reuse /js/transformers-loader.js, no new shared infrastructure" — partly to keep this release focused and partly because adding a second inference path would create ongoing maintenance overhead. NAFNet is deferred to a future release when either a Xenova or onnx-community port lands, or when there is a strong enough demand signal to justify the infrastructure work.
Color restoration models. Most candidates (DDColor, ColorMNet, Bringing Old Photos Back to Life) carry GPL or research-only licenses. The few permissively-licensed alternatives have the same problem as NAFNet — no transformers.js processor config — so they would also require a new inference path. Deferred for the same reasons.
Depth Anything V2 Base, Large, and Giant. Better depth quality than Small, but CC-BY-NC-4.0. The Small variant is the only one in this family UDT can ship.
Same Family, Different License
The strengthened v34 convention reads: a Hugging Face model family is not a license unit. Within Depth Anything V2 the Small variant is Apache 2.0 and the Base, Large, and Giant variants are CC-BY-NC-4.0. Within BRIA, both RMBG 1.4 and RMBG 2.0 are non-commercial despite being part of an "open" series. Within Swin2SR the classical and realworld variants are both Apache 2.0, but other Swin2SR-architecture models on Hugging Face (Xenova/2x_APISR_RRDB_GAN_generator-onnx, which uses the swin2sr tag) are GPL-3.0. The only safe approach is to pin the exact model identifier you intend to ship, click through to the source repository the Xenova or onnx-community page points at, and confirm the license on that specific model's card. v33 captured the first version of this rule. v34 has now refined it twice over.
What Comes Next
v34.5 is a planned cleanup pass — article-body backfill on the legacy tail, the remaining I-CATEGORIES-005 hero-count drifts, and a verify-script regex fix for category-page link detection. v35 picks up the Color category expansion that has been queued since v32. The AI Suite is unlikely to get a v35 batch — at 11 tools the Suite is full enough that the next AI addition will probably wait until a meaningful new model becomes practical in the browser (vision LLMs in the gigabyte range are not there yet for default loads).
Until then, the AI Suite is live at /tools/category/ai/. Four new image tools, all permissively licensed, all running on your machine.