harvey2333 commited on
Commit
9249cbd
Β·
1 Parent(s): a6fe732

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -64
README.md CHANGED
@@ -4,71 +4,9 @@ license: apache-2.0
4
 
5
  # Omni-VideoAssistant
6
  This is a Video Question Answering Large Language model.
7
- [code base](https://github.com/wanghao-cst/Omni-VideoAssistant).
8
 
9
  ## πŸ“ Updates
10
  * **[2023.12.09]** πŸ€—[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_6_1) **A Better Model V6.1** are available now! Welcome to **watch** this repository for the latest updates.
11
  * **[2023.12.06]** Gradio & CLI **Inference Demo** are available now.
12
- * **[2023.12.01]** πŸ€—[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_5_3) **Preview Model** are available now!
13
-
14
- <details open><summary>πŸ’‘ I also have other video-language projects that may interest you ✨. </summary><p>
15
- <!-- may -->
16
-
17
- > [**OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation**](https://arxiv.org/abs/2308.04126) <br>
18
- > Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An <br>
19
- [![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/shajiayu1/MVCE/) [![arXiv](https://img.shields.io/badge/Arxiv-2310.01852-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2308.04126) <br></p></details>
20
-
21
-
22
- ## πŸ”¨ Preparation
23
- ```bash
24
- git clone https://github.com/wanghao-cst/Omni-VideoAssistant
25
- cd Omni-VideoAssistant
26
- ```
27
- ```shell
28
- conda create -n omni python=3.10 -y
29
- conda activate omni
30
- pip install --upgrade pip
31
- pip install -e .
32
- ```
33
-
34
- ## 🌟 Start here
35
- ### Download Omni Preview Model
36
- Download for CLI inference only, gradio web UI will download it automatically.
37
- [Omni Preview Model 6.1](https://huggingface.co/harvey2333/omni_video_assistant_6_1)
38
-
39
- ### Inference in Gradio Web UI
40
-
41
- ```Shell
42
- CUDA_VISIBLE_DEVICES=0 python -m llava.serve.gradio_demo
43
- ```
44
- <p align="left">
45
- <img src="assets/gradio_demo.png" width=100%>
46
- </p>
47
-
48
- ### Inference in CLI
49
- ```
50
- CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
51
- --model-path "path to omni checkpoints" \
52
- --image-file "llava/serve/examples/extreme_ironing.jpg" \
53
- --query "What is unusual about this image?"
54
- CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
55
- --model-path "path to omni checkpoints" \
56
- --video-file "llava/serve/examples/0A8CF.mp4" \
57
- --query "Describe the activity in the video"
58
- ```
59
-
60
- ## πŸ”₯ Results Comparision (based on model 5.3, evaluation on 6.1 is doing)
61
- ### Image understanding
62
- <p align="left">
63
- <img src="assets/val_img.png" width=100%>
64
- </p>
65
-
66
- ### Video understanding
67
- <p align="left">
68
- <img src="assets/val_vid.png" width=100%>
69
- </p>
70
-
71
-
72
- ## 😊 Acknowledgment
73
-
74
- This work is based on [MVCE for unlimited training data generation.](https://github.com/shajiayu1/MVCE/), [LLaVA for pretrained model](https://github.com/haotian-liu/LLaVA/)
 
4
 
5
  # Omni-VideoAssistant
6
  This is a Video Question Answering Large Language model.
7
+ [code base is here for more details:](https://github.com/wanghao-cst/Omni-VideoAssistant).
8
 
9
  ## πŸ“ Updates
10
  * **[2023.12.09]** πŸ€—[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_6_1) **A Better Model V6.1** are available now! Welcome to **watch** this repository for the latest updates.
11
  * **[2023.12.06]** Gradio & CLI **Inference Demo** are available now.
12
+ * **[2023.12.01]** πŸ€—[Hugging Face](https://huggingface.co/harvey2333/omni_video_assistant_5_3) **Preview Model** are available now!