Safetensors
English
olmo2
amanrangapur commited on
Commit
67a1620
·
verified ·
1 Parent(s): 83f13fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -107,7 +107,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
107
  <!-- TODO -->
108
  ## Evaluation
109
 
110
- Core model results for OLMo2 7B models are found below.
111
 
112
  | Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo2 7B** |
113
  |-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
@@ -157,9 +157,9 @@ In contrast to OLMo 1.0, we trained OLMo 7B July with a two-stage curriculum:
157
  Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 1.7 already outperforms OLMo 1.0. The second stage consistently adds 2 to 3 points of performance on top.
158
 
159
 
160
- ### Architecture
161
 
162
- OLMo 7B architecture with peer models for comparison.
163
 
164
  | | **OLMo2 7B** | [OLMo2 13B](https://huggingface.co/allenai/OLMo2-13B-1124) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
165
  |------------------------|-------------------|-------------------|---------------------|--------------------|--------------------|------------------|
@@ -203,7 +203,7 @@ Optimizer settings comparison with peer models.
203
  | gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 | global 1.0 |
204
  | gradient reduce dtype | FP32 | FP32 | FP32 | FP32 | BF16 |
205
  | optimizer state dtype | FP32 | FP32 | most likely FP32 | FP32 | FP32 |
206
-
207
 
208
 
209
  ## Bias, Risks, and Limitations
 
107
  <!-- TODO -->
108
  ## Evaluation
109
 
110
+ Core model results for OLMo2 7B models are found below:
111
 
112
  | Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo2 7B** |
113
  |-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
 
157
  Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 1.7 already outperforms OLMo 1.0. The second stage consistently adds 2 to 3 points of performance on top.
158
 
159
 
160
+ <!-- ### Architecture
161
 
162
+ OLMo2 7B architecture with peer models for comparison.
163
 
164
  | | **OLMo2 7B** | [OLMo2 13B](https://huggingface.co/allenai/OLMo2-13B-1124) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
165
  |------------------------|-------------------|-------------------|---------------------|--------------------|--------------------|------------------|
 
203
  | gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 | global 1.0 |
204
  | gradient reduce dtype | FP32 | FP32 | FP32 | FP32 | BF16 |
205
  | optimizer state dtype | FP32 | FP32 | most likely FP32 | FP32 | FP32 |
206
+ -->
207
 
208
 
209
  ## Bias, Risks, and Limitations