Update README.md
Browse files
README.md
CHANGED
@@ -70,7 +70,7 @@ The getting started notebook of WangchanBERTa model can be found at this [Colab
|
|
70 |
|
71 |
## Training data
|
72 |
|
73 |
-
`wangchanberta-base-wiki-spm` model was pretrained on Thai Wikipedia. Specifically we use the Wikipedia dump articles on 20 August 2020 (dumps.wikimedia.org/thwiki/20200820/). We opt out lists, and tables.
|
74 |
|
75 |
### Preprocessing
|
76 |
|
@@ -100,7 +100,7 @@ We split sequencially 944,782 sentences for training set, 24,863 sentences for v
|
|
100 |
|
101 |
**Pretraining**
|
102 |
|
103 |
-
The model was trained on 32 V100 GPUs for 31,250 steps with the batch size of 8,192 (16
|
104 |
|
105 |
<br>
|
106 |
|
|
|
70 |
|
71 |
## Training data
|
72 |
|
73 |
+
`wangchanberta-base-wiki-spm` model was pretrained on Thai Wikipedia. Specifically, we use the Wikipedia dump articles on 20 August 2020 (dumps.wikimedia.org/thwiki/20200820/). We opt out lists, and tables.
|
74 |
|
75 |
### Preprocessing
|
76 |
|
|
|
100 |
|
101 |
**Pretraining**
|
102 |
|
103 |
+
The model was trained on 32 V100 GPUs for 31,250 steps with the batch size of 8,192 (16 sequences per device with 16 accumulation steps) and a sequence length of 512 tokens. The optimizer we used is Adam with the learning rate of $7e-4$, $\beta_1 = 0.9$, $\beta_2= 0.98$ and $\epsilon = 1e-6$. The learning rate is warmed up for the first 1250 steps and linearly decayed to zero. The model checkpoint with minimum validation loss will be selected as the best model checkpoint.
|
104 |
|
105 |
<br>
|
106 |
|