lalital commited on
Commit
0ae9e5b
·
1 Parent(s): fe298ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -70,7 +70,7 @@ The getting started notebook of WangchanBERTa model can be found at this [Colab
70
 
71
  ## Training data
72
 
73
- `wangchanberta-base-wiki-spm` model was pretrained on Thai Wikipedia. Specifically we use the Wikipedia dump articles on 20 August 2020 (dumps.wikimedia.org/thwiki/20200820/). We opt out lists, and tables.
74
 
75
  ### Preprocessing
76
 
@@ -100,7 +100,7 @@ We split sequencially 944,782 sentences for training set, 24,863 sentences for v
100
 
101
  **Pretraining**
102
 
103
- The model was trained on 32 V100 GPUs for 31,250 steps with the batch size of 8,192 (16 mini batches per device with 16 accumulation steps) and a sequence length of 512 tokens. The optimizer we used is Adam with the learning rate of $7e-4$, $\beta_1 = 0.9$, $\beta_2= 0.98$ and $\epsilon = 1e-6$. The learning rate is warmed up for the first 1250 steps and linearly decayed to zero. The model checkpoint with minimum validation loss will be selected as the best model checkpoint.
104
 
105
  <br>
106
 
 
70
 
71
  ## Training data
72
 
73
+ `wangchanberta-base-wiki-spm` model was pretrained on Thai Wikipedia. Specifically, we use the Wikipedia dump articles on 20 August 2020 (dumps.wikimedia.org/thwiki/20200820/). We opt out lists, and tables.
74
 
75
  ### Preprocessing
76
 
 
100
 
101
  **Pretraining**
102
 
103
+ The model was trained on 32 V100 GPUs for 31,250 steps with the batch size of 8,192 (16 sequences per device with 16 accumulation steps) and a sequence length of 512 tokens. The optimizer we used is Adam with the learning rate of $7e-4$, $\beta_1 = 0.9$, $\beta_2= 0.98$ and $\epsilon = 1e-6$. The learning rate is warmed up for the first 1250 steps and linearly decayed to zero. The model checkpoint with minimum validation loss will be selected as the best model checkpoint.
104
 
105
  <br>
106