--- license: apache-2.0 --- # Omni-VideoAssistant This is a Video Question Answering Large Language model. [code base](https://github.com/wanghao-cst/Omni-VideoAssistant). ## 📝 Updates * **[2023.12.09]** 🤗[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_6_1) **A Better Model V6.1** are available now! Welcome to **watch** this repository for the latest updates. * **[2023.12.06]** Gradio & CLI **Inference Demo** are available now. * **[2023.12.01]** 🤗[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_5_3) **Preview Model** are available now!
💡 I also have other video-language projects that may interest you ✨.

> [**OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation**](https://arxiv.org/abs/2308.04126)
> Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/shajiayu1/MVCE/) [![arXiv](https://img.shields.io/badge/Arxiv-2310.01852-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2308.04126)

## 🔨 Preparation ```bash git clone https://github.com/wanghao-cst/Omni-VideoAssistant cd Omni-VideoAssistant ``` ```shell conda create -n omni python=3.10 -y conda activate omni pip install --upgrade pip pip install -e . ``` ## 🌟 Start here ### Download Omni Preview Model Download for CLI inference only, gradio web UI will download it automatically. [Omni Preview Model 6.1](https://huggingface.co/harvey2333/omni_video_assistant_6_1) ### Inference in Gradio Web UI ```Shell CUDA_VISIBLE_DEVICES=0 python -m llava.serve.gradio_demo ```

### Inference in CLI ``` CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \ --model-path "path to omni checkpoints" \ --image-file "llava/serve/examples/extreme_ironing.jpg" \ --query "What is unusual about this image?" CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \ --model-path "path to omni checkpoints" \ --video-file "llava/serve/examples/0A8CF.mp4" \ --query "Describe the activity in the video" ``` ## 🔥 Results Comparision (based on model 5.3, evaluation on 6.1 is doing) ### Image understanding

### Video understanding

## 😊 Acknowledgment This work is based on [MVCE for unlimited training data generation.](https://github.com/shajiayu1/MVCE/), [LLaVA for pretrained model](https://github.com/haotian-liu/LLaVA/)