RaymondLi
/

sc2-1b-data-ablation

Model card Files Files and versions Community

RaymondLi commited on Oct 11, 2023

Commit

1ac955a

·

1 Parent(s): 5fe27bb

Create README.md

Files changed (1) hide show

README.md +9 -0

README.md ADDED Viewed

	@@ -0,0 +1,9 @@

+1B-parameter models trained on Python-only datasets. In the different branches, models are trained on different versions of the Stack:
+- stack v1
+- stack v2 - permissive
+- stack v2 - permissive and unlicensed
+24 layers, a hidden-size of 2048 and 16 attention heads (multiquery).
+The learning-rate is set to $4\times10^{-4}$ after a warmup of $1000$ steps and follows a cosine decay to $4\times10^{-5}$ at the end of training.
+Trained with a batch size of 128 samples of 8192 tokens each, for $100$k iterations, such that the model sees $100$B tokens at the end of training.
+We use a FIM-rate of $0.5$, the same tokenizer as StarCoder (except for tokenizer ablations) and learned absolute positional embeddings.