amanrangapur
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -107,7 +107,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
|
|
107 |
<!-- TODO -->
|
108 |
## Evaluation
|
109 |
|
110 |
-
Core model results for OLMo2 7B models are found below
|
111 |
|
112 |
| Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo2 7B** |
|
113 |
|-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
|
@@ -157,9 +157,9 @@ In contrast to OLMo 1.0, we trained OLMo 7B July with a two-stage curriculum:
|
|
157 |
Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 1.7 already outperforms OLMo 1.0. The second stage consistently adds 2 to 3 points of performance on top.
|
158 |
|
159 |
|
160 |
-
### Architecture
|
161 |
|
162 |
-
|
163 |
|
164 |
| | **OLMo2 7B** | [OLMo2 13B](https://huggingface.co/allenai/OLMo2-13B-1124) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
|
165 |
|------------------------|-------------------|-------------------|---------------------|--------------------|--------------------|------------------|
|
@@ -203,7 +203,7 @@ Optimizer settings comparison with peer models.
|
|
203 |
| gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 | global 1.0 |
|
204 |
| gradient reduce dtype | FP32 | FP32 | FP32 | FP32 | BF16 |
|
205 |
| optimizer state dtype | FP32 | FP32 | most likely FP32 | FP32 | FP32 |
|
206 |
-
|
207 |
|
208 |
|
209 |
## Bias, Risks, and Limitations
|
|
|
107 |
<!-- TODO -->
|
108 |
## Evaluation
|
109 |
|
110 |
+
Core model results for OLMo2 7B models are found below:
|
111 |
|
112 |
| Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo2 7B** |
|
113 |
|-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
|
|
|
157 |
Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 1.7 already outperforms OLMo 1.0. The second stage consistently adds 2 to 3 points of performance on top.
|
158 |
|
159 |
|
160 |
+
<!-- ### Architecture
|
161 |
|
162 |
+
OLMo2 7B architecture with peer models for comparison.
|
163 |
|
164 |
| | **OLMo2 7B** | [OLMo2 13B](https://huggingface.co/allenai/OLMo2-13B-1124) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
|
165 |
|------------------------|-------------------|-------------------|---------------------|--------------------|--------------------|------------------|
|
|
|
203 |
| gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 | global 1.0 |
|
204 |
| gradient reduce dtype | FP32 | FP32 | FP32 | FP32 | BF16 |
|
205 |
| optimizer state dtype | FP32 | FP32 | most likely FP32 | FP32 | FP32 |
|
206 |
+
-->
|
207 |
|
208 |
|
209 |
## Bias, Risks, and Limitations
|