harvey2333 commited on
Commit
a6fe732
Β·
1 Parent(s): 94e8319

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Omni-VideoAssistant
6
+ This is a Video Question Answering Large Language model.
7
+ [code base](https://github.com/wanghao-cst/Omni-VideoAssistant).
8
+
9
+ ## πŸ“ Updates
10
+ * **[2023.12.09]** πŸ€—[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_6_1) **A Better Model V6.1** are available now! Welcome to **watch** this repository for the latest updates.
11
+ * **[2023.12.06]** Gradio & CLI **Inference Demo** are available now.
12
+ * **[2023.12.01]** πŸ€—[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_5_3) **Preview Model** are available now!
13
+
14
+ <details open><summary>πŸ’‘ I also have other video-language projects that may interest you ✨. </summary><p>
15
+ <!-- may -->
16
+
17
+ > [**OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation**](https://arxiv.org/abs/2308.04126) <br>
18
+ > Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An <br>
19
+ [![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/shajiayu1/MVCE/) [![arXiv](https://img.shields.io/badge/Arxiv-2310.01852-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2308.04126) <br></p></details>
20
+
21
+
22
+ ## πŸ”¨ Preparation
23
+ ```bash
24
+ git clone https://github.com/wanghao-cst/Omni-VideoAssistant
25
+ cd Omni-VideoAssistant
26
+ ```
27
+ ```shell
28
+ conda create -n omni python=3.10 -y
29
+ conda activate omni
30
+ pip install --upgrade pip
31
+ pip install -e .
32
+ ```
33
+
34
+ ## 🌟 Start here
35
+ ### Download Omni Preview Model
36
+ Download for CLI inference only, gradio web UI will download it automatically.
37
+ [Omni Preview Model 6.1](https://huggingface.co/harvey2333/omni_video_assistant_6_1)
38
+
39
+ ### Inference in Gradio Web UI
40
+
41
+ ```Shell
42
+ CUDA_VISIBLE_DEVICES=0 python -m llava.serve.gradio_demo
43
+ ```
44
+ <p align="left">
45
+ <img src="assets/gradio_demo.png" width=100%>
46
+ </p>
47
+
48
+ ### Inference in CLI
49
+ ```
50
+ CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
51
+ --model-path "path to omni checkpoints" \
52
+ --image-file "llava/serve/examples/extreme_ironing.jpg" \
53
+ --query "What is unusual about this image?"
54
+ CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
55
+ --model-path "path to omni checkpoints" \
56
+ --video-file "llava/serve/examples/0A8CF.mp4" \
57
+ --query "Describe the activity in the video"
58
+ ```
59
+
60
+ ## πŸ”₯ Results Comparision (based on model 5.3, evaluation on 6.1 is doing)
61
+ ### Image understanding
62
+ <p align="left">
63
+ <img src="assets/val_img.png" width=100%>
64
+ </p>
65
+
66
+ ### Video understanding
67
+ <p align="left">
68
+ <img src="assets/val_vid.png" width=100%>
69
+ </p>
70
+
71
+
72
+ ## 😊 Acknowledgment
73
+
74
+ This work is based on [MVCE for unlimited training data generation.](https://github.com/shajiayu1/MVCE/), [LLaVA for pretrained model](https://github.com/haotian-liu/LLaVA/)