File size: 3,356 Bytes
7544a9f 9103bc8 066feae 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 25216a9 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 7544a9f 9103bc8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
license: apache-2.0
tags:
- vision
pipeline_tag: depth-estimation
widget:
- inference: false
---
# Depth Anything (small-sized model, Transformers version)
Depth Anything model. It was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al. and first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
[Online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything) is also provided.
Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.
## Model description
Depth Anything leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
alt="drawing" width="600"/>
<small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
## Intended uses & limitations
You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
other versions on a task that interests you.
### How to use
Here is how to use this model to perform zero-shot depth estimation:
```python
from transformers import pipeline
from PIL import Image
import requests
# load pipe
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-small-hf")
# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
# inference
depth = pipe(image)["depth"]
```
Alternatively, one can use the classes themselves:
```python
from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-small-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-small-hf")
# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predicted_depth = outputs.predicted_depth
# interpolate to original size
prediction = torch.nn.functional.interpolate(
predicted_depth.unsqueeze(1),
size=image.size[::-1],
mode="bicubic",
align_corners=False,
)
```
For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
### BibTeX entry and citation info
```bibtex
@misc{yang2024depth,
title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
year={2024},
eprint={2401.10891},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |