osanseviero
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ Model capabilities:
|
|
37 |
|
38 |
**Model Developers** Meta
|
39 |
|
40 |
-
**Variations** Code Llama comes in
|
41 |
|
42 |
* Code Llama: base models designed for general code synthesis and understanding
|
43 |
* Code Llama - Python: designed specifically for Python
|
@@ -52,8 +52,9 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
|
|
52 |
**Output** Models generate text only.
|
53 |
|
54 |
**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
|
|
|
55 |
|
56 |
-
**Model Dates** Code Llama and its variants have been trained between January 2023 and
|
57 |
|
58 |
**Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
|
59 |
|
@@ -69,6 +70,8 @@ All variants are available in sizes of 7B, 13B, 34B, and 70B parameters.
|
|
69 |
## Hardware and Software
|
70 |
**Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
|
71 |
|
|
|
|
|
72 |
## Evaluation Results
|
73 |
|
74 |
See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.
|
|
|
37 |
|
38 |
**Model Developers** Meta
|
39 |
|
40 |
+
**Variations** Code Llama comes in four model sizes, and three variants:
|
41 |
|
42 |
* Code Llama: base models designed for general code synthesis and understanding
|
43 |
* Code Llama - Python: designed specifically for Python
|
|
|
52 |
**Output** Models generate text only.
|
53 |
|
54 |
**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture.
|
55 |
+
**Model Architecture** Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant **does not** support long context of up to 100k tokens.
|
56 |
|
57 |
+
**Model Dates** Code Llama and its variants have been trained between January 2023 and January 2024.
|
58 |
|
59 |
**Status** This is a static model trained on an offline dataset. Future versions of Code Llama - Instruct will be released as we improve model safety with community feedback.
|
60 |
|
|
|
70 |
## Hardware and Software
|
71 |
**Training Factors** We used custom training libraries. The training and fine-tuning of the released models have been performed Meta’s Research Super Cluster.
|
72 |
|
73 |
+
**Carbon Footprint** In aggregate, training all 12 Code Llama models required 1400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 228.55 tCO2eq, 100% of which were offset by Meta’s sustainability program.
|
74 |
+
|
75 |
## Evaluation Results
|
76 |
|
77 |
See evaluations for the main models and detailed ablations in Section 3 and safety evaluations in Section 4 of the research paper.
|