metadata
license: apache-2.0
Omni-VideoAssistant
This is a Video Question Answering Large Language model. code base.
π Updates
- [2023.12.09] π€Hugging Face A Better Model V6.1 are available now! Welcome to watch this repository for the latest updates.
- [2023.12.06] Gradio & CLI Inference Demo are available now.
- [2023.12.01] π€Hugging Face Preview Model are available now!
π‘ I also have other video-language projects that may interest you β¨.
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An
π¨ Preparation
git clone https://github.com/wanghao-cst/Omni-VideoAssistant
cd Omni-VideoAssistant
conda create -n omni python=3.10 -y
conda activate omni
pip install --upgrade pip
pip install -e .
π Start here
Download Omni Preview Model
Download for CLI inference only, gradio web UI will download it automatically. Omni Preview Model 6.1
Inference in Gradio Web UI
CUDA_VISIBLE_DEVICES=0 python -m llava.serve.gradio_demo
Inference in CLI
CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
--model-path "path to omni checkpoints" \
--image-file "llava/serve/examples/extreme_ironing.jpg" \
--query "What is unusual about this image?"
CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
--model-path "path to omni checkpoints" \
--video-file "llava/serve/examples/0A8CF.mp4" \
--query "Describe the activity in the video"
π₯ Results Comparision (based on model 5.3, evaluation on 6.1 is doing)
Image understanding
Video understanding
π Acknowledgment
This work is based on MVCE for unlimited training data generation., LLaVA for pretrained model