airesearch
/

wangchanberta-base-wiki-spm

Inference Endpoints

Model card Files Files and versions Community

lalital commited on Jan 25, 2021

Commit

0ae9e5b

·

1 Parent(s): fe298ff

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -70,7 +70,7 @@ The getting started notebook of WangchanBERTa model can be found at this [Colab
 ## Training data
-`wangchanberta-base-wiki-spm` model was pretrained on Thai Wikipedia. Specifically we use the Wikipedia dump articles on 20 August 2020 (dumps.wikimedia.org/thwiki/20200820/). We opt out lists, and tables.
 ### Preprocessing
@@ -100,7 +100,7 @@ We split sequencially 944,782 sentences for training set, 24,863 sentences for v
 **Pretraining**
-The model was trained on 32 V100 GPUs for 31,250 steps with the batch size of 8,192 (16 mini batches per device with 16 accumulation steps) and a sequence length of 512 tokens. The optimizer we used is Adam with the learning rate of $7e-4$, $\beta_1 = 0.9$, $\beta_2= 0.98$ and $\epsilon = 1e-6$. The learning rate is warmed up for the first 1250 steps and linearly decayed to zero. The model checkpoint with minimum validation loss will be selected as the best model checkpoint.
 <br>

 ## Training data
+`wangchanberta-base-wiki-spm` model was pretrained on Thai Wikipedia. Specifically, we use the Wikipedia dump articles on 20 August 2020 (dumps.wikimedia.org/thwiki/20200820/). We opt out lists, and tables.
 ### Preprocessing
 **Pretraining**
+The model was trained on 32 V100 GPUs for 31,250 steps with the batch size of 8,192 (16 sequences per device with 16 accumulation steps) and a sequence length of 512 tokens. The optimizer we used is Adam with the learning rate of $7e-4$, $\beta_1 = 0.9$, $\beta_2= 0.98$ and $\epsilon = 1e-6$. The learning rate is warmed up for the first 1250 steps and linearly decayed to zero. The model checkpoint with minimum validation loss will be selected as the best model checkpoint.
 <br>