Chrome ships the Prompt API as stable — any web page can now run a local AI model
For most of the last two years, running a language model inside a web page meant either shipping a multi-gigabyte model yourself or sending every keystroke off to a paid cloud endpoint. At Google I/O 2026, Chrome closed that gap: the Prompt API graduated to stable in Chrome 148, letting any website call the browser's built-in Gemini Nano model directly from JavaScript — no server round-trip, no API key, and no per-token bill.
What actually shipped
The Prompt API exposes the on-device Gemini Nano model through a single JavaScript entry point. You check whether a model is available, create a session, and send it a natural-language request — the call looks roughly like await LanguageModel.create() followed by a prompt. The stable release adds multimodal inputs (text, image, and audio), structured JSON output that's reliable enough to feed straight into application logic, and broader language coverage. It joins three other now-stable built-in APIs — Summarizer, Translator, and Language Detector — with a few more still in origin trial.
The pitch Google leaned on at I/O was Trip.com generating personalized travel summaries entirely on-device, sidestepping server overhead and scaling to unlimited queries with no cloud cost. That's the shape of the use case this unlocks: high-volume, low-stakes text work where paying per token never made sense.
Where the limits still are
This is not a drop-in replacement for a frontier model. Gemini Nano is small — it weighs in at several gigabytes on disk, needs a meaningful chunk of RAM or GPU memory, and the first run requires the browser to have downloaded the model. The honest framing that's circulating among developers who've shipped against it is that Nano is the autocomplete of language models: reach for it where you'd use a smart-suggest, not where you'd use a full hosted model for complex reasoning.
Coverage is the other catch. The Gemini Nano APIs run in Chrome on desktop operating systems and Chromebook Plus devices; Chrome on Android and iOS aren't supported yet. So the realistic adoption pattern is progressive enhancement — check availability at runtime, use the local model when it's there, and fall back to a server when it isn't.
Why this matters for tool builders
If you build browser-based tools, this is the same philosophy we've leaned on for years at Ultimate Design Tools: do the work on the user's device, keep their data off our servers, and skip the infrastructure bill entirely. A stable Prompt API makes that approach viable for a whole class of lightweight AI features — rewriting a snippet, classifying an input, drafting alt text — that previously forced a choice between bundling a big model or routing data to a third party.
There's a privacy footnote worth keeping in view: when a page calls Gemini Nano, the page itself can see the inputs and outputs, and there's no browser-level badge yet that proves to a user that a given site is staying local. If your selling point is "your data never leaves your device," it's still on you to make that verifiable. But the direction is clear — on-device inference is becoming a default capability of the platform rather than a special-case experiment.