Question 1

What is the download size for Depth Anything V2 Small?

Accepted Answer

Depth Anything V2 Small is approximately 27 MB at int8 quantization, the smallest model across the AI Suite. The browser caches it in IndexedDB after the first download, so later visits load almost instantly.

Question 2

Will the input image be uploaded to a server?

Accepted Answer

No. After the model finishes downloading on first use, every depth estimation runs entirely in your browser. The image stays on your device and is never sent to our servers or to any third-party API.

Question 3

What is depth estimation useful for in design work?

Accepted Answer

The most common uses are subject masking (separating foreground from background by depth threshold), depth-of-field simulation (blurring background pixels), parallax composites (layered movement between near and far elements), displacement maps for 3D-style effects, and feeding the depth into ControlNet for diffusion-guided edits.

Question 4

Which depth model and license does this tool use?

Accepted Answer

The tool uses Depth Anything V2 Small from the depth-anything organization on Hugging Face. The Small variant is released under the Apache 2.0 license, which permits commercial use. Note that the Base, Large, and Giant variants of Depth Anything V2 are CC-BY-NC-4.0 (non-commercial) and not used by this tool.

Question 5

Is the depth map metric or relative?

Accepted Answer

Relative. Depth Anything V2 Small produces relative depth values — pixel values represent ordering from nearest to farthest, not absolute distances in meters. For nearly all design and compositing uses this is what you want. For absolute distance measurement (robotics, autonomous driving) a metric depth model is required.

Question 6

Why is the output grayscale?

Accepted Answer

The standard depth map convention is grayscale: white means near, black means far. This format drops directly into image editors as a mask, into 3D software as a displacement texture, and into ControlNet without any color-channel manipulation. A colorized visualization preview is also rendered alongside the grayscale output for at-a-glance review.

Question 7

What image sizes work well?

Accepted Answer

The model resizes inputs to a standard working size internally and then upsamples the depth map to match the original image dimensions. Any common input size works. Very wide or very tall images sometimes produce slightly blurrier depth at the edges, which is a known limitation of monocular depth estimation.

Question 8

Why does the depth look wrong on flat or textureless areas?

Accepted Answer

Monocular depth estimation relies on visual cues (perspective, occlusion, focus, lighting). Flat textureless surfaces — clean walls, blank sky, plain backgrounds — have few cues, so the model can produce confident-looking but unreliable depth there. The output is most accurate on images with clear depth cues like overlapping objects, perspective lines, or visible texture.

AI Depth Estimator