RussRobin
/

SpatialBot-3B-LoRA

Visual Question Answering

text-generation

Spatial Understanding

Inference Endpoints

Model card Files Files and versions Community

SpatialBot-3B-LoRA / README.md

RussRobin's picture

Update README.md

9b8fabf verified 5 months ago

|

history blame contribute delete

992 Bytes

	---
	license: cc-by-4.0
	datasets:
	- RussRobin/SpatialQA
	language:
	- en
	tags:
	- Embodied AI
	- MLLM
	- VLM
	- Spatial Understanding
	- Phi-2
	pipeline_tag: visual-question-answering
	---

	SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks.

	In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench.

	You will also need to download [pretrained CKPT](https://huggingface.co/RussRobin/SpatialBot-3B-pretrain).
	### Paper:
	https://arxiv.org/abs/2406.13642

	### GitHub repo:
	https://github.com/BAAI-DCAI/SpatialBot

	<!-- ### SpatialQA, the training set:
	https://huggingface.co/datasets/RussRobin/SpatialQA -->

	### SpatialBench, the benchmark:
	https://huggingface.co/datasets/RussRobin/SpatialBench

	### Merged SpatialBot-3B:
	https://huggingface.co/RussRobin/SpatialBot-3B