osanseviero commited on
Commit
b7ede3a
·
verified ·
1 Parent(s): c647022

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -37,7 +37,7 @@ Model capabilities:
37
 
38
  **Model Developers** Meta
39
 
40
- **Variations** Code Llama comes in three model sizes, and three variants:
41
 
42
  * Code Llama: base models designed for general code synthesis and understanding
43
  * Code Llama - Python: designed specifically for Python
@@ -52,8 +52,9 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
52
  **Output** Models generate text only.
53
 
54
  **Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
 
55
 
56
- **Model Dates** Code Llama and its variants have been trained between January 2023 and July 2023.
57
 
58
  **Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
59
 
@@ -69,6 +70,8 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
69
  ## Hardware and Software
70
  **Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
71
 
 
 
72
  ## Evaluation Results
73
 
74
  See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.
 
37
 
38
  **Model Developers** Meta
39
 
40
+ **Variations** Code Llama comes in four model sizes, and three variants:
41
 
42
  * Code Llama: base models designed for general code synthesis and understanding
43
  * Code Llama - Python: designed specifically for Python
 
52
  **Output** Models generate text only.
53
 
54
  **Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
55
+ **Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant **does not** support long context of up to 100k tokens.
56
 
57
+ **Model Dates** Code Llama and its variants have been trained between January 2023 and January 2024.
58
 
59
  **Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
60
 
 
70
  ## Hardware and Software
71
  **Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
72
 
73
+ **Carbon Footprint** In aggregate, training all 12 Code Llama models required 1400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 228.55 tCO2eq, 100% of which were offset by Meta’s sustainability program.
74
+
75
  ## Evaluation Results
76
 
77
  See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.