Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
1B-parameter models trained on Python-only datasets. In the different branches, models are trained on different versions of the Stack:
|
2 |
+
- stack v1
|
3 |
+
- stack v2 - permissive
|
4 |
+
- stack v2 - permissive and unlicensed
|
5 |
+
|
6 |
+
24 layers, a hidden-size of 2048 and 16 attention heads (multiquery).
|
7 |
+
The learning-rate is set to $4\times10^{-4}$ after a warmup of $1000$ steps and follows a cosine decay to $4\times10^{-5}$ at the end of training.
|
8 |
+
Trained with a batch size of 128 samples of 8192 tokens each, for $100$k iterations, such that the model sees $100$B tokens at the end of training.
|
9 |
+
We use a FIM-rate of $0.5$, the same tokenizer as StarCoder (except for tokenizer ablations) and learned absolute positional embeddings.
|