metadata

license: apache-2.0

Omni-VideoAssistant

This is a Video Question Answering Large Language model. code base.

📝 Updates

[2023.12.09] 🤗Hugging Face A Better Model V6.1 are available now! Welcome to watch this repository for the latest updates.
[2023.12.06] Gradio & CLI Inference Demo are available now.
[2023.12.01] 🤗Hugging Face Preview Model are available now!

💡 I also have other video-language projects that may interest you ✨.

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

🔨 Preparation

git clone https://github.com/wanghao-cst/Omni-VideoAssistant
cd Omni-VideoAssistant

conda create -n omni python=3.10 -y
conda activate omni
pip install --upgrade pip
pip install -e .

🌟 Start here

Download Omni Preview Model

Download for CLI inference only, gradio web UI will download it automatically. Omni Preview Model 6.1

Inference in Gradio Web UI

CUDA_VISIBLE_DEVICES=0 python -m  llava.serve.gradio_demo

Inference in CLI

CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
    --model-path "path to omni checkpoints" \
    --image-file "llava/serve/examples/extreme_ironing.jpg" \
    --query "What is unusual about this image?"
CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
    --model-path "path to omni checkpoints" \
    --video-file "llava/serve/examples/0A8CF.mp4" \
    --query "Describe the activity in the video"

🔥 Results Comparision (based on model 5.3, evaluation on 6.1 is doing)

Image understanding

Video understanding

😊 Acknowledgment

This work is based on MVCE for unlimited training data generation., LLaVA for pretrained model