AI Suite · May 14, 2026

v33: Four Real AI Models, Running in Your Browser

The UDT AI category went live in v32 as three reference tools. This release adds the part that actually runs models — a summarizer, a paraphraser, a grammar checker, and a translator across 10 language pairs. All four models are Apache 2.0. All four run entirely in the browser via transformers.js. No upload, no API key, no signup. The category is now a 7-tool Suite.

Picking the four models took longer than building the tools around them, because the first three candidates I planned to use were licensed in ways that ruled out commercial deployment. The notes below cover both what shipped and what got cut, since the licensing audit is the part that other browser-AI projects most often skip.

AI Summarizer

AI Summarizer uses distilbart-cnn-6-6, a distilled BART model fine-tuned on the CNN/DailyMail summarization dataset by Sam Shleifer at Hugging Face. The model is about 155 MB compressed, downloads once on first use, and is cached in IndexedDB for subsequent visits. Apache 2.0 licensed. On a modern laptop with WebGPU, a 2,000-word article summarizes in a handful of seconds; browsers without WebGPU fall back to WebAssembly and run the same model, slower but functional. Length controls (min/max new tokens) let you bias toward shorter or longer summaries, and a beam-width slider trades latency for slightly more polished output. Long inputs are automatically chunked at word boundaries, each chunk summarized, and the chunk summaries concatenated. The model was trained on news prose, so it does best on factual structured text and worst on transcripts and marketing copy.

AI Paraphraser

AI Paraphraser uses Google's Flan-T5-small, about 80 MB compressed, also Apache 2.0. Flan-T5 is an instruction-tuned T5 variant — it responds to natural-language prompts like "Rewrite the following text in a more formal tone" rather than relying on a fixed task head. The tool ships six built-in rewrite styles (Formal, Casual, Shorter, Longer, Simpler, Custom) and a Custom mode where you write your own instruction prefix. The Custom mode is the most powerful: try "Rewrite for a 10-year-old reader," "Convert to bulleted list," or "Translate to passive voice." The small variant is fast enough to run interactively without WebGPU; on lower-end machines a paragraph rewrites in 4-8 seconds.

The first candidate here was a popular T5 paraphraser called humarin/chatgpt_paraphraser_on_T5_base. Better quality than Flan-T5-small, but licensed openrail, which carries use-based restrictions on what people can generate with the model. For a public tool where the operator cannot vet every input, use-based restrictions create real legal exposure even when the typical use is fine. Apache 2.0 Flan-T5 was the safer call.

AI Grammar Checker

AI Grammar Checker uses pszemraj/grammar-synthesis-small, a T5-based model fine-tuned on a synthetically-augmented version of the JFLEG benchmark. About 80 MB compressed. Apache 2.0. The tool splits input into sentences with an abbreviation-aware splitter, feeds each sentence to the model, and displays the corrected text with word-level diff highlighting (struck-through removals, underlined additions). A copy button copies the corrected text without the diff markup.

The first candidate was vennify/t5-base-grammar-correction, which is the model most public tutorials reach for. It works well, but it is licensed cc-by-nc-sa-4.0 — the NC stands for non-commercial. UDT runs ads on the surrounding pages, so even though the tool itself is free, the commercial-context test fails and that model is out. The pszemraj alternative is genuinely Apache 2.0 and produces comparable output for sentence-level errors. It is conservative by design: it prefers leaving text alone over guessing, which means it sometimes misses real errors but rarely introduces new ones. For aggressive style rewriting, use the Paraphraser.

AI Translator

AI Translator uses Helsinki-NLP's opus-mt family of bilingual models, ported to ONNX for browser execution by Xenova. Each pair is its own model (Xenova/opus-mt-en-es for English-to-Spanish, Xenova/opus-mt-es-en for the reverse), typically 30-45 MB compressed. The tool only downloads a pair when you actually pick it. Switch from English-Spanish to English-French and you fetch another ~40 MB; switch back and the original loads from cache. Coverage at launch is 10 languages, both directions: Spanish, French, German, Italian, Portuguese, Russian, Chinese (Simplified), Japanese, Arabic, and Dutch — 20 individual models, downloaded on demand.

The big multilingual models that cover hundreds of languages in one shot (Meta's NLLB-200, Facebook's M2M-100) are released under CC-BY-NC, so they were never an option. Helsinki-NLP opus-mt is Apache 2.0, with no attribution requirement (we credit Helsinki-NLP anyway because the work deserves it). The trade-off is one model per pair instead of one model for everything, but for the 10 launch languages this is actually a feature — you only fetch what you use, and each download is smaller.

The Japanese pair has an inherited quirk worth flagging: the Xenova ONNX repo uses "jap" as the target-side code in the EN→JA direction (Xenova/opus-mt-en-jap) and "ja" as the source-side code in JA→EN (Xenova/opus-mt-ja-en). The translator-loader in /js/transformers-loader.js handles the asymmetry so the UI can stay clean with a single "ja" code.

Why In The Browser?

For most users, the difference between a cloud AI tool and a browser AI tool is invisible — paste, click, get output. The differences show up at the edges:

Privacy. The four tools above run inference entirely on your device. For an internal memo, a draft contract, a leaked transcript, or anything else that should not get pasted into a public chatbot, the browser path is the only path. No request log. No retention policy. No "we may use your prompts to improve our models."
Cost. No per-token pricing, no API key, no monthly minimum. The only cost is the one-time model download.
Rate limits. None. You can summarize 500 articles in a row if your browser will let you. The hosted APIs throttle aggressively at higher tiers.
Offline. After the first model download, inference works without network access. Useful for travel, conferences, or anywhere with flaky connectivity.
Latency. WebGPU acceleration on a modern laptop is fast enough that round-trip latency is competitive with hosted APIs for short inputs. Long inputs are slower than hosted because the local model is smaller.

The trade-off is honest: the small models that fit in a browser produce noticeably less polished output than GPT-4-class or Claude-class models. For sensitive content, casual rewrites, quick translations, and high-volume use the privacy and cost advantages usually win. For final-pass writing, the hosted models still produce better output. The AI Suite is built around that reality: the four model-powered tools handle the local path, and the three reference tools (template builder, comparison table, system prompt library) make working with hosted models more efficient when you reach for them.

What Comes Next

v34 will focus on the image-side of the AI Suite — upscaler, background remover (a higher-quality MediaPipe alternative), and image-to-prompt — using ONNX models on the same browser-only pattern. The translator will grow new language pairs based on demand. A future release will add a service worker so the Suite is fully offline-capable after the first visit, not just for inference.

Until then, the AI Suite is live at /tools/category/ai/. Apache 2.0 all the way down.