iseesaw commited on
Commit
ed30435
·
1 Parent(s): 8b4d6db

update readme

Browse files
Files changed (1) hide show
  1. README.md +29 -24
README.md CHANGED
@@ -17,30 +17,9 @@ Llama-3-8B-UltraMedical has achieved top average scores across several popular m
17
  In these benchmarks, Llama-3-8B-UltraMedical significantly outperforms Flan-PaLM, OpenBioLM-8B, Gemini-1.0, GPT-3.5, and Meditron-70b.
18
  We extend our gratitude to Meta for the Llama model, which provided an excellent foundation for our fine-tuning efforts.
19
 
20
- ## Model Details
21
 
22
- <!-- Provide a longer summary of what this model is. -->
23
-
24
- This model is trained using the full parameters and the Fully Sharded Data Parallel (FSDP) framework.
25
- The training process was performed on 8 x A6000 GPUs for about 50 hours.
26
-
27
- Hyperparameters:
28
-
29
- - torch type: bfloat16
30
- - epochs: 3
31
- - learning rate: 2e-5
32
- - learning rate scheduler type: cosine
33
- - warmup ratio: 0.04
34
- - max length: 1024
35
- - global batch size: 128
36
-
37
- - **License:** [Meta Llama-3 License](https://llama.meta.com/llama3/license/).
38
- - **Finetuned from model:** [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
39
- - **Finetuned on data:** [UltraMedical](https://github.com/TsinghuaC3I/UltraMedical)
40
-
41
- ### Usage
42
-
43
- #### Chat Template
44
 
45
  This model utilizes the Llama-3 default chat template without a system prompt.
46
  Below, we provide input examples for multi-choice QA, PubMedQA, and open-ended questions.
@@ -78,7 +57,7 @@ Investigate the mechanistic implications of statins, primarily used for lipid mo
78
  ```
79
 
80
 
81
- #### Inference with vLLM
82
 
83
  ```python
84
  from transformers import AutoTokenizer
@@ -128,8 +107,34 @@ In the table above:
128
 
129
  - For MedQA, we use the 4 options from the US set. For MedMCQA, we use the Dev split. For PubMedQA, we use the reasoning required set.
130
 
 
 
131
  - Greedy search is employed as our default decoding strategy. We denote ensemble scores with self-consistency as `(Ensemble)`. In our experiments, we conduct 10 decoding trials, and final decisions are made via majority vote (temperature=0.7, top_p=0.9).
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  ## Limitations & Safe Use
134
 
135
  While our model offers promising capabilities, it is crucial to exercise caution when using it in real-world clinical settings due to potential hallucination issues. Hallucinations, where the model generates incorrect or misleading information, can pose significant risks in clinical decision-making. Users are advised to validate the model's outputs with trusted medical sources and expert consultation to ensure safety and accuracy.
 
17
  In these benchmarks, Llama-3-8B-UltraMedical significantly outperforms Flan-PaLM, OpenBioLM-8B, Gemini-1.0, GPT-3.5, and Meditron-70b.
18
  We extend our gratitude to Meta for the Llama model, which provided an excellent foundation for our fine-tuning efforts.
19
 
20
+ ## Usage
21
 
22
+ ### Chat Template
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  This model utilizes the Llama-3 default chat template without a system prompt.
25
  Below, we provide input examples for multi-choice QA, PubMedQA, and open-ended questions.
 
57
  ```
58
 
59
 
60
+ ### Inference with vLLM
61
 
62
  ```python
63
  from transformers import AutoTokenizer
 
107
 
108
  - For MedQA, we use the 4 options from the US set. For MedMCQA, we use the Dev split. For PubMedQA, we use the reasoning required set.
109
 
110
+ - For MMLU, we include Clinical Knowledge (CK), Medical Genetics (MG), Anatomy (An), Professional Medicine (PM), College Biology (CB), and College Medicine (CM) to maintain consistency with previous studies.
111
+
112
  - Greedy search is employed as our default decoding strategy. We denote ensemble scores with self-consistency as `(Ensemble)`. In our experiments, we conduct 10 decoding trials, and final decisions are made via majority vote (temperature=0.7, top_p=0.9).
113
 
114
+ - Partial results for 7B pre-trained models are sourced from the [Open Medical-LLM Leaderboard](https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard).
115
+
116
+ ## Training Details
117
+
118
+ <!-- Provide a longer summary of what this model is. -->
119
+
120
+ This model is trained using the full parameters and the Fully Sharded Data Parallel (FSDP) framework.
121
+ The training process was performed on 8 x A6000 GPUs for about 50 hours.
122
+
123
+ Hyperparameters:
124
+
125
+ - torch type: bfloat16
126
+ - epochs: 3
127
+ - learning rate: 2e-5
128
+ - learning rate scheduler type: cosine
129
+ - warmup ratio: 0.04
130
+ - max length: 1024
131
+ - global batch size: 128
132
+
133
+ - **License:** [Meta Llama-3 License](https://llama.meta.com/llama3/license/).
134
+ - **Finetuned from model:** [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
135
+ - **Finetuned on data:** [UltraMedical](https://github.com/TsinghuaC3I/UltraMedical)
136
+
137
+
138
  ## Limitations & Safe Use
139
 
140
  While our model offers promising capabilities, it is crucial to exercise caution when using it in real-world clinical settings due to potential hallucination issues. Hallucinations, where the model generates incorrect or misleading information, can pose significant risks in clinical decision-making. Users are advised to validate the model's outputs with trusted medical sources and expert consultation to ensure safety and accuracy.