Xin Dong
commited on
Commit
·
5c73b29
1
Parent(s):
6abbf5e
add eval
Browse files
README.md
CHANGED
@@ -106,6 +106,38 @@ print(f"Model response: {response}")
|
|
106 |
|
107 |
```
|
108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
|
110 |
## Limitations
|
111 |
|
|
|
106 |
|
107 |
```
|
108 |
|
109 |
+
## Evaluation
|
110 |
+
We use [`LM Evaluation Harness`](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the model. The evaluation commands are as follows:
|
111 |
+
|
112 |
+
```bash
|
113 |
+
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
|
114 |
+
git fetch --all --tags
|
115 |
+
git checkout tags/v0.4.4 # squad completion task is not compatible with the latest version
|
116 |
+
cd lm-evaluation-harness
|
117 |
+
pip install -e .
|
118 |
+
|
119 |
+
lm_eval --model hf --model_args pretrained=nvidia/Hymba-1.5B-Base,dtype=bfloat16,trust_remote_code=True \
|
120 |
+
--tasks mmlu \
|
121 |
+
--num_fewshot 5 \
|
122 |
+
--batch_size 1 \
|
123 |
+
--output_path ./hymba_HF_base_lm-results \
|
124 |
+
--log_samples
|
125 |
+
|
126 |
+
lm_eval --model hf --model_args pretrained=nvidia/Hymba-1.5B-Base,dtype=bfloat16,trust_remote_code=True \
|
127 |
+
--tasks arc_easy,arc_challenge,piqa,winogrande,hellaswag \
|
128 |
+
--num_fewshot 0 \
|
129 |
+
--batch_size 1 \
|
130 |
+
--output_path ./hymba_HF_base_lm-results \
|
131 |
+
--log_samples
|
132 |
+
|
133 |
+
lm_eval --model hf --model_args pretrained=nvidia/Hymba-1.5B-Base,dtype=bfloat16,trust_remote_code=True \
|
134 |
+
--tasks squad_completion \
|
135 |
+
--num_fewshot 1 \
|
136 |
+
--batch_size 1 \
|
137 |
+
--output_path ./hymba_HF_base_lm-results \
|
138 |
+
--log_samples
|
139 |
+
```
|
140 |
+
|
141 |
|
142 |
## Limitations
|
143 |
|