Update README.md
Browse files
README.md
CHANGED
@@ -24,16 +24,17 @@ pipeline_tag: visual-question-answering
|
|
24 |
|
25 |
Human-related vision and language tasks are widely applied across various social scenarios. The latest studies demonstrate that the large vision-language model can enhance the performance of various downstream tasks in visual-language understanding. Since, models in the general domain often not perform well in the specialized field. In this study, we train a domain-specific Large Language-Vision model, Human-LLaVA, which aim to construct an unified multimodal Language-Vision Model for Human-related tasks.
|
26 |
|
27 |
-
Specifically, (1) we first construct a large-scale and high-quality human-related image-text (caption) dataset extracted from Internet for domain-specific alignment in the first stage (Coming soon); (2) we also propose to construct a multi-granularity caption for human-related images (Coming soon), including human face, human body, and whole image, thereby fine-tuning a large language model. Lastly, we evaluate our model on a series of downstream tasks, our Human-LLaVA achieved the best overall performance among multimodal models of similar scale. In particular, it exhibits the best performance in a series of human-related tasks, significantly surpassing similar models and ChatGPT-4o. We believe that the Huaman-LLaVA model and a series of datasets presented in this work can promote research in related fields.
|
28 |
|
29 |
|
30 |
## Result
|
31 |
|
32 |
## News and Update π₯π₯π₯
|
33 |
-
* 2024.
|
34 |
|
35 |
|
36 |
-
##
|
|
|
37 |
``` python
|
38 |
import requests
|
39 |
from PIL import Image
|
@@ -67,6 +68,7 @@ print(predict)
|
|
67 |
HumanCaption-10M(self construct): Coming Soon!
|
68 |
|
69 |
#### Instruction Tuning Stage
|
|
|
70 |
|
71 |
HumanCaptionHQ-300K(self construct): Coming Soon!
|
72 |
|
@@ -76,13 +78,12 @@ humanvg_high_reg(self construct):Coming Soon!
|
|
76 |
|
77 |
humanvg_high_rec(self construct):Coming Soon!
|
78 |
|
79 |
-
celeba_attribute(self construct):
|
80 |
|
81 |
-
|
82 |
|
83 |
LLaVA-Instruct_zh :
|
84 |
|
85 |
-
ShareGPT4V_vqa:
|
86 |
|
87 |
verified_ref3rec:
|
88 |
|
|
|
24 |
|
25 |
Human-related vision and language tasks are widely applied across various social scenarios. The latest studies demonstrate that the large vision-language model can enhance the performance of various downstream tasks in visual-language understanding. Since, models in the general domain often not perform well in the specialized field. In this study, we train a domain-specific Large Language-Vision model, Human-LLaVA, which aim to construct an unified multimodal Language-Vision Model for Human-related tasks.
|
26 |
|
27 |
+
Specifically, (1) we first construct **a large-scale and high-quality human-related image-text (caption) dataset** extracted from Internet for domain-specific alignment in the first stage (Coming soon); (2) we also propose to construct **a multi-granularity caption for human-related images** (Coming soon), including human face, human body, and whole image, thereby fine-tuning a large language model. Lastly, we evaluate our model on a series of downstream tasks, our **Human-LLaVA** achieved the best overall performance among multimodal models of similar scale. In particular, it exhibits the best performance in a series of human-related tasks, significantly surpassing similar models and ChatGPT-4o. We believe that the Huaman-LLaVA model and a series of datasets presented in this work can promote research in related fields.
|
28 |
|
29 |
|
30 |
## Result
|
31 |
|
32 |
## News and Update π₯π₯π₯
|
33 |
+
* Sep.8, 2024. **π€[Human-LLaVA-8B](https://huggingface.co/OpenFace-CQUPT/Human_LLaVA), is released!πππ**
|
34 |
|
35 |
|
36 |
+
## π€ Transformers
|
37 |
+
To use Human-LLaVA for the inference, all you need to do is to input a few lines of codes as demonstrated below. However, please make sure that you are using latest code.
|
38 |
``` python
|
39 |
import requests
|
40 |
from PIL import Image
|
|
|
68 |
HumanCaption-10M(self construct): Coming Soon!
|
69 |
|
70 |
#### Instruction Tuning Stage
|
71 |
+
All public data sets have been filtered, and we will consider publishing all processed text in the future
|
72 |
|
73 |
HumanCaptionHQ-300K(self construct): Coming Soon!
|
74 |
|
|
|
78 |
|
79 |
humanvg_high_rec(self construct):Coming Soon!
|
80 |
|
81 |
+
celeba_attribute(self construct): [CelebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
|
82 |
|
83 |
+
ShareGPT4V:[ShareGPT4V]https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md
|
84 |
|
85 |
LLaVA-Instruct_zh :
|
86 |
|
|
|
87 |
|
88 |
verified_ref3rec:
|
89 |
|