alinemati commited on
Commit
1aa8b0f
·
verified ·
1 Parent(s): e146bac

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -0
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Llama-3.2-11B-Vision-Instruct
7
+ pipeline_tag: visual-question-answering
8
+
9
+ tags:
10
+
11
+ - indox
12
+ - phoenix
13
+ - osllm.ai
14
+ - language
15
+ ---
16
+ # Model Card for Model ID
17
+
18
+ <!-- Provide a quick summary of what the model is/does. -->
19
+
20
+ Llama-3.2V-11B-cot is the first version of [LLaVA-o1](https://github.com/PKU-YuanGroup/LLaVA-o1), which is a visual language model capable of spontaneous, systematic reasoning.
21
+
22
+ ## Model Details
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+ - **License:** apache-2.0
27
+ - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
28
+
29
+ ## Benchmark Results
30
+
31
+ | MMStar | MMBench | MMVet | MathVista | AI2D | Hallusion | Average |
32
+ |--------|---------|-------|-----------|------|-----------|---------|
33
+ | 57.6 | 75.0 | 60.3 | 54.8 | 85.7 | 47.8 | 63.5 |
34
+
35
+ ## Reproduction
36
+
37
+ <!-- This section describes the evaluation protocols and provides the results. -->
38
+
39
+ To reproduce our results, you should use [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) and the following settings.
40
+
41
+ | Parameter | Value |
42
+ |-------------------|---------|
43
+ | do_sample | True |
44
+ | temperature | 0.6 |
45
+ | top_p | 0.9 |
46
+ | max_new_tokens | 2048 |
47
+
48
+ You may change them in [this file](https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/vlm/llama_vision.py), line 80-83, and modify the max_new_tokens throughout the file.
49
+
50
+ Note: We follow the same settings as Llama-3.2-11B-Vision-Instruct, except that we extend the max_new_tokens to 2048.
51
+
52
+ After you get the results, you should filter the model output and only **keep the outputs between \<CONCLUSION\> and \</CONCLUSION\>**.
53
+
54
+ This shouldn't have any difference in theory, but empirically we observe some performance difference because the jugder GPT-4o can be inaccurate sometimes.
55
+
56
+ By keeping the outputs between \<CONCLUSION\> and \</CONCLUSION\>, most answers can be direclty extracted using VLMEvalKit system, which can be much less biased.
57
+
58
+ ## How to Get Started with the Model
59
+
60
+ You can use the inference code for Llama-3.2-11B-Vision-Instruct.
61
+
62
+ ## Training Details
63
+
64
+ ### Training Data
65
+
66
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
67
+
68
+ The model is trained on the LLaVA-o1-100k dataset (to be released).
69
+
70
+ ### Training Procedure
71
+
72
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
73
+
74
+ The model is finetuned on [llama-recipes](https://github.com/Meta-Llama/llama-recipes) with the following settings.
75
+ Using the same setting should accurately reproduce our results.
76
+
77
+ | Parameter | Value |
78
+ |-------------------------------|---------------------------------------------------|
79
+ | FSDP | enabled |
80
+ | lr | 1e-5 |
81
+ | num_epochs | 3 |
82
+ | batch_size_training | 4 |
83
+ | use_fast_kernels | True |
84
+ | run_validation | False |
85
+ | batching_strategy | padding |
86
+ | context_length | 4096 |
87
+ | gradient_accumulation_steps | 1 |
88
+ | gradient_clipping | False |
89
+ | gradient_clipping_threshold | 1.0 |
90
+ | weight_decay | 0.0 |
91
+ | gamma | 0.85 |
92
+ | seed | 42 |
93
+ | use_fp16 | False |
94
+ | mixed_precision | True |
95
+
96
+
97
+ ## Bias, Risks, and Limitations
98
+
99
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
100
+
101
+ The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data.
102
+ Technically, the model's performance in aspects like instruction following still falls short of leading industry models.
103
+
104
+
105
+
106
+
107
+ **About [osllm.ai](https://osllm.ai)**:
108
+
109
+ [osllm.ai](https://osllm.ai) is a community-driven platform that provides access to a wide range of open-source language models.
110
+
111
+ 1. **[IndoxJudge](https://github.com/indoxJudge)**: A free, open-source tool for evaluating large language models (LLMs).
112
+ It provides key metrics to assess performance, reliability, and risks like bias and toxicity, helping ensure model safety.
113
+
114
+ 1. **[inDox](https://github.com/inDox)**: An open-source retrieval augmentation tool for extracting data from various
115
+ document formats (text, PDFs, HTML, Markdown, LaTeX). It handles structured and unstructured data and supports both
116
+ online and offline LLMs.
117
+
118
+ 1. **[IndoxGen](https://github.com/IndoxGen)**: A framework for generating high-fidelity synthetic data using LLMs and
119
+ human feedback, designed for enterprise use with high flexibility and precision.
120
+
121
+ 1. **[Phoenix](https://github.com/Phoenix)**: A multi-platform, open-source chatbot that interacts with documents
122
+ locally, without internet or GPU. It integrates inDox and IndoxJudge to improve accuracy and prevent hallucinations,
123
+ ideal for sensitive fields like healthcare.
124
+
125
+ 1. **[Phoenix_cli](https://github.com/Phoenix_cli)**: A multi-platform command-line tool that runs LLaMA models locally,
126
+ supporting up to eight concurrent tasks through multithreading, eliminating the need for cloud-based services.
127
+
128
+
129
+
130
+ **Disclaimers**
131
+
132
+ [osllm.ai](https://osllm.ai) is not the creator, originator, or owner of any Model featured in the Community Model Program.
133
+ Each Community Model is created and provided by third parties. osllm.ai does not endorse, support, represent,
134
+ or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand
135
+ that Community Models can produce content that might be offensive, harmful, inaccurate, or otherwise
136
+ inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who
137
+ originated such Model. osllm.ai may not monitor or control the Community Models and cannot, and does not, take
138
+ responsibility for any such Model. osllm.ai disclaims all warranties or guarantees about the accuracy,
139
+ reliability, or benefits of the Community Models. osllm.ai further disclaims any warranty that the Community
140
+ Model will meet your requirements, be secure, uninterrupted, or available at any time or location, or
141
+ error-free, virus-free, or that any errors will be corrected, or otherwise. You will be solely responsible for
142
+ any damage resulting from your use of or access to the Community Models, your downloading of any Community
143
+ Model, or use of any other Community Model provided by or through [osllm.ai](https://osllm.ai).
144
+
145
+