|
--- |
|
license: cc-by-4.0 |
|
datasets: |
|
- RussRobin/SpatialQA |
|
language: |
|
- en |
|
tags: |
|
- Embodied AI |
|
- MLLM |
|
- VLM |
|
- Spatial Understanding |
|
- Phi-2 |
|
pipeline_tag: visual-question-answering |
|
--- |
|
|
|
SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks. |
|
|
|
In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench. |
|
|
|
You will also need to download [pretrained CKPT](https://huggingface.co/RussRobin/SpatialBot-3B-pretrain). |
|
### Paper: |
|
https://arxiv.org/abs/2406.13642 |
|
|
|
### GitHub repo: |
|
https://github.com/BAAI-DCAI/SpatialBot |
|
|
|
<!-- ### SpatialQA, the training set: |
|
https://huggingface.co/datasets/RussRobin/SpatialQA --> |
|
|
|
### SpatialBench, the benchmark: |
|
https://huggingface.co/datasets/RussRobin/SpatialBench |
|
|
|
### Merged SpatialBot-3B: |
|
https://huggingface.co/RussRobin/SpatialBot-3B |