Dmitry Chaplinsky
commited on
Commit
·
1e0215d
1
Parent(s):
06a5b46
Adding everything
Browse files- README.md +58 -0
- best-lm.pt +3 -0
- flair_dictionary.pkl +3 -0
- loss.txt +599 -0
- pipeline.py +22 -0
- requirements.txt +1 -0
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- uk
|
4 |
+
tags:
|
5 |
+
- text2text-generation
|
6 |
+
- flair
|
7 |
+
library_name: generic
|
8 |
license: mit
|
9 |
+
metrics:
|
10 |
+
- perplexity
|
11 |
+
datasets:
|
12 |
+
- ubertext2.0
|
13 |
+
widget:
|
14 |
+
- text: "Росія зазнає поразки"
|
15 |
+
- text: "Достеменно відомо, що Україна перемагає"
|
16 |
---
|
17 |
+
|
18 |
+
# Ukrainian flair embeddings (forward, large)
|
19 |
+
|
20 |
+
Trained for 10 epochs on the texts from ubertext2.0 and corpus of Ukrainian scraped texts from Stefan Schweter (54GB in total).
|
21 |
+
The characters dictionary used for training is in `flair_dictionary.pkl` file
|
22 |
+
|
23 |
+
The model params are:
|
24 |
+
```python
|
25 |
+
is_forward_lm=True,
|
26 |
+
hidden_size=2048,
|
27 |
+
sequence_length=250,
|
28 |
+
mini_batch_size=1024,
|
29 |
+
max_epochs=30
|
30 |
+
```
|
31 |
+
|
32 |
+
For more information on flair embeddings see [the article](https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) or the paper below:
|
33 |
+
|
34 |
+
|
35 |
+
```bibtex
|
36 |
+
@inproceedings{akbik2018coling,
|
37 |
+
title={Contextual String Embeddings for Sequence Labeling},
|
38 |
+
author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
|
39 |
+
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
|
40 |
+
pages = {1638--1649},
|
41 |
+
year = {2018}
|
42 |
+
}
|
43 |
+
```
|
44 |
+
|
45 |
+
For more information on UberText 2.0 please see:
|
46 |
+
```bibtex
|
47 |
+
@inproceedings{chaplynskyi-2023-introducing,
|
48 |
+
title = "Introducing {U}ber{T}ext 2.0: A Corpus of {M}odern {U}krainian at Scale",
|
49 |
+
author = "Chaplynskyi, Dmytro",
|
50 |
+
booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
|
51 |
+
month = may,
|
52 |
+
year = "2023",
|
53 |
+
address = "Dubrovnik, Croatia",
|
54 |
+
publisher = "Association for Computational Linguistics",
|
55 |
+
url = "https://aclanthology.org/2023.unlp-1.1",
|
56 |
+
pages = "1--10",
|
57 |
+
abstract = "This paper addresses the need for massive corpora for a low-resource language and presents the publicly available UberText 2.0 corpus for the Ukrainian language and discusses the methodology of its construction. While the collection and maintenance of such a corpus is more of a data extraction and data engineering task, the corpus itself provides a solid foundation for natural language processing tasks. It can enable the creation of contemporary language models and word embeddings, resulting in a better performance of numerous downstream tasks for the Ukrainian language. In addition, the paper and software developed can be used as a guidance and model solution for other low-resource languages. The resulting corpus is available for download on the project page. It has 3.274 billion tokens, consists of 8.59 million texts and takes up 32 gigabytes of space.",
|
58 |
+
}
|
59 |
+
```
|
60 |
+
|
61 |
+
Copyright: [Dmytro Chaplynskyi](https://twitter.com/dchaplinsky), [lang-uk](https://lang.org.ua) project, 2023
|
best-lm.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6d05b5d0f1b68ff0bd7a2ad1a852d25d1034de52fd823e4b9304ce5fc1c615ed
|
3 |
+
size 78734687
|
flair_dictionary.pkl
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2125c32d2db5fb79676a8a6f087b19e9c3b788cb19b87073423e31e176d1fe24
|
3 |
+
size 11900
|
loss.txt
ADDED
@@ -0,0 +1,599 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| end of split 1 / 62 | epoch 1 | time: 1603.89s | valid loss 1.4399 | valid ppl 4.2204 | learning rate 20.0000
|
2 |
+
| end of split 2 / 62 | epoch 1 | time: 1607.81s | valid loss 1.2745 | valid ppl 3.5770 | learning rate 20.0000
|
3 |
+
| end of split 3 / 62 | epoch 1 | time: 1606.22s | valid loss 1.2037 | valid ppl 3.3323 | learning rate 20.0000
|
4 |
+
| end of split 4 / 62 | epoch 1 | time: 1606.92s | valid loss 1.1638 | valid ppl 3.2020 | learning rate 20.0000
|
5 |
+
| end of split 5 / 62 | epoch 1 | time: 1607.10s | valid loss 1.1394 | valid ppl 3.1250 | learning rate 20.0000
|
6 |
+
| end of split 6 / 62 | epoch 1 | time: 1607.63s | valid loss 1.1180 | valid ppl 3.0588 | learning rate 20.0000
|
7 |
+
| end of split 7 / 62 | epoch 1 | time: 1608.12s | valid loss 1.1052 | valid ppl 3.0200 | learning rate 20.0000
|
8 |
+
| end of split 8 / 62 | epoch 1 | time: 1608.18s | valid loss 1.0969 | valid ppl 2.9948 | learning rate 20.0000
|
9 |
+
| end of split 9 / 62 | epoch 1 | time: 1592.98s | valid loss 1.0812 | valid ppl 2.9482 | learning rate 20.0000
|
10 |
+
| end of split 10 / 62 | epoch 1 | time: 1597.67s | valid loss 1.0791 | valid ppl 2.9420 | learning rate 20.0000
|
11 |
+
| end of split 11 / 62 | epoch 1 | time: 1598.41s | valid loss 1.0690 | valid ppl 2.9124 | learning rate 20.0000
|
12 |
+
| end of split 12 / 62 | epoch 1 | time: 1594.52s | valid loss 1.0625 | valid ppl 2.8937 | learning rate 20.0000
|
13 |
+
| end of split 13 / 62 | epoch 1 | time: 1595.52s | valid loss 1.0584 | valid ppl 2.8816 | learning rate 20.0000
|
14 |
+
| end of split 14 / 62 | epoch 1 | time: 1593.63s | valid loss 1.0520 | valid ppl 2.8634 | learning rate 20.0000
|
15 |
+
| end of split 15 / 62 | epoch 1 | time: 1593.45s | valid loss 1.1233 | valid ppl 3.0750 | learning rate 20.0000
|
16 |
+
| end of split 16 / 62 | epoch 1 | time: 1594.20s | valid loss 1.0477 | valid ppl 2.8511 | learning rate 20.0000
|
17 |
+
| end of split 17 / 62 | epoch 1 | time: 1594.12s | valid loss 1.0393 | valid ppl 2.8274 | learning rate 20.0000
|
18 |
+
| end of split 18 / 62 | epoch 1 | time: 1592.60s | valid loss 1.0382 | valid ppl 2.8242 | learning rate 20.0000
|
19 |
+
| end of split 19 / 62 | epoch 1 | time: 1591.84s | valid loss 1.0321 | valid ppl 2.8071 | learning rate 20.0000
|
20 |
+
| end of split 20 / 62 | epoch 1 | time: 1591.25s | valid loss 1.0335 | valid ppl 2.8109 | learning rate 20.0000
|
21 |
+
| end of split 21 / 62 | epoch 1 | time: 1593.49s | valid loss 1.0276 | valid ppl 2.7944 | learning rate 20.0000
|
22 |
+
| end of split 22 / 62 | epoch 1 | time: 1590.55s | valid loss 1.0265 | valid ppl 2.7913 | learning rate 20.0000
|
23 |
+
| end of split 23 / 62 | epoch 1 | time: 1591.47s | valid loss 1.0218 | valid ppl 2.7781 | learning rate 20.0000
|
24 |
+
| end of split 24 / 62 | epoch 1 | time: 1589.39s | valid loss 1.0218 | valid ppl 2.7781 | learning rate 20.0000
|
25 |
+
| end of split 25 / 62 | epoch 1 | time: 1591.76s | valid loss 1.0182 | valid ppl 2.7682 | learning rate 20.0000
|
26 |
+
| end of split 26 / 62 | epoch 1 | time: 1586.71s | valid loss 1.0198 | valid ppl 2.7726 | learning rate 20.0000
|
27 |
+
| end of split 27 / 62 | epoch 1 | time: 1584.62s | valid loss 1.0144 | valid ppl 2.7578 | learning rate 20.0000
|
28 |
+
| end of split 28 / 62 | epoch 1 | time: 1586.04s | valid loss 1.0124 | valid ppl 2.7521 | learning rate 20.0000
|
29 |
+
| end of split 29 / 62 | epoch 1 | time: 1583.84s | valid loss 1.0164 | valid ppl 2.7633 | learning rate 20.0000
|
30 |
+
| end of split 30 / 62 | epoch 1 | time: 1582.16s | valid loss 1.0126 | valid ppl 2.7527 | learning rate 20.0000
|
31 |
+
| end of split 31 / 62 | epoch 1 | time: 1582.81s | valid loss 1.0114 | valid ppl 2.7495 | learning rate 20.0000
|
32 |
+
| end of split 32 / 62 | epoch 1 | time: 1584.10s | valid loss 1.0078 | valid ppl 2.7396 | learning rate 20.0000
|
33 |
+
| end of split 33 / 62 | epoch 1 | time: 1583.96s | valid loss 1.0067 | valid ppl 2.7367 | learning rate 20.0000
|
34 |
+
| end of split 34 / 62 | epoch 1 | time: 1584.53s | valid loss 1.0311 | valid ppl 2.8043 | learning rate 20.0000
|
35 |
+
| end of split 35 / 62 | epoch 1 | time: 1585.34s | valid loss 1.0022 | valid ppl 2.7243 | learning rate 20.0000
|
36 |
+
| end of split 36 / 62 | epoch 1 | time: 1585.67s | valid loss 1.0017 | valid ppl 2.7229 | learning rate 20.0000
|
37 |
+
| end of split 37 / 62 | epoch 1 | time: 1583.84s | valid loss 1.0020 | valid ppl 2.7236 | learning rate 20.0000
|
38 |
+
| end of split 38 / 62 | epoch 1 | time: 1584.28s | valid loss 0.9989 | valid ppl 2.7152 | learning rate 20.0000
|
39 |
+
| end of split 39 / 62 | epoch 1 | time: 1585.90s | valid loss 1.0254 | valid ppl 2.7882 | learning rate 20.0000
|
40 |
+
| end of split 40 / 62 | epoch 1 | time: 1588.16s | valid loss 0.9973 | valid ppl 2.7110 | learning rate 20.0000
|
41 |
+
| end of split 41 / 62 | epoch 1 | time: 1586.15s | valid loss 0.9961 | valid ppl 2.7076 | learning rate 20.0000
|
42 |
+
| end of split 42 / 62 | epoch 1 | time: 1588.69s | valid loss 0.9963 | valid ppl 2.7083 | learning rate 20.0000
|
43 |
+
| end of split 43 / 62 | epoch 1 | time: 1588.30s | valid loss 0.9934 | valid ppl 2.7005 | learning rate 20.0000
|
44 |
+
| end of split 44 / 62 | epoch 1 | time: 1587.86s | valid loss 0.9962 | valid ppl 2.7080 | learning rate 20.0000
|
45 |
+
| end of split 45 / 62 | epoch 1 | time: 1588.43s | valid loss 0.9921 | valid ppl 2.6970 | learning rate 20.0000
|
46 |
+
| end of split 46 / 62 | epoch 1 | time: 1591.45s | valid loss 0.9913 | valid ppl 2.6949 | learning rate 20.0000
|
47 |
+
| end of split 47 / 62 | epoch 1 | time: 1590.01s | valid loss 1.0074 | valid ppl 2.7386 | learning rate 20.0000
|
48 |
+
| end of split 48 / 62 | epoch 1 | time: 1589.84s | valid loss 0.9891 | valid ppl 2.6889 | learning rate 20.0000
|
49 |
+
| end of split 49 / 62 | epoch 1 | time: 1591.41s | valid loss 0.9893 | valid ppl 2.6893 | learning rate 20.0000
|
50 |
+
| end of split 50 / 62 | epoch 1 | time: 1592.88s | valid loss 0.9881 | valid ppl 2.6861 | learning rate 20.0000
|
51 |
+
| end of split 51 / 62 | epoch 1 | time: 1593.67s | valid loss 0.9872 | valid ppl 2.6836 | learning rate 20.0000
|
52 |
+
| end of split 52 / 62 | epoch 1 | time: 1593.93s | valid loss 0.9938 | valid ppl 2.7015 | learning rate 20.0000
|
53 |
+
| end of split 53 / 62 | epoch 1 | time: 1593.15s | valid loss 0.9875 | valid ppl 2.6845 | learning rate 20.0000
|
54 |
+
| end of split 54 / 62 | epoch 1 | time: 1593.89s | valid loss 0.9844 | valid ppl 2.6763 | learning rate 20.0000
|
55 |
+
| end of split 55 / 62 | epoch 1 | time: 1594.52s | valid loss 0.9852 | valid ppl 2.6782 | learning rate 20.0000
|
56 |
+
| end of split 56 / 62 | epoch 1 | time: 1593.26s | valid loss 0.9848 | valid ppl 2.6772 | learning rate 20.0000
|
57 |
+
| end of split 57 / 62 | epoch 1 | time: 1594.39s | valid loss 0.9827 | valid ppl 2.6717 | learning rate 20.0000
|
58 |
+
| end of split 58 / 62 | epoch 1 | time: 1593.89s | valid loss 0.9834 | valid ppl 2.6736 | learning rate 20.0000
|
59 |
+
| end of split 59 / 62 | epoch 1 | time: 1594.99s | valid loss 0.9814 | valid ppl 2.6682 | learning rate 20.0000
|
60 |
+
| end of split 60 / 62 | epoch 1 | time: 1595.07s | valid loss 0.9885 | valid ppl 2.6871 | learning rate 20.0000
|
61 |
+
| end of split 61 / 62 | epoch 1 | time: 1593.04s | valid loss 0.9834 | valid ppl 2.6736 | learning rate 20.0000
|
62 |
+
| end of split 62 / 62 | epoch 1 | time: 850.81s | valid loss 0.9894 | valid ppl 2.6895 | learning rate 20.0000
|
63 |
+
| end of split 1 / 62 | epoch 2 | time: 1589.43s | valid loss 0.9930 | valid ppl 2.6992 | learning rate 20.0000
|
64 |
+
| end of split 2 / 62 | epoch 2 | time: 1592.05s | valid loss 0.9823 | valid ppl 2.6706 | learning rate 20.0000
|
65 |
+
| end of split 3 / 62 | epoch 2 | time: 1591.91s | valid loss 0.9795 | valid ppl 2.6631 | learning rate 20.0000
|
66 |
+
| end of split 4 / 62 | epoch 2 | time: 1589.81s | valid loss 0.9798 | valid ppl 2.6638 | learning rate 20.0000
|
67 |
+
| end of split 5 / 62 | epoch 2 | time: 1592.72s | valid loss 0.9863 | valid ppl 2.6812 | learning rate 20.0000
|
68 |
+
| end of split 6 / 62 | epoch 2 | time: 1591.02s | valid loss 0.9793 | valid ppl 2.6627 | learning rate 20.0000
|
69 |
+
| end of split 7 / 62 | epoch 2 | time: 1591.96s | valid loss 0.9778 | valid ppl 2.6587 | learning rate 20.0000
|
70 |
+
| end of split 8 / 62 | epoch 2 | time: 1589.75s | valid loss 0.9770 | valid ppl 2.6565 | learning rate 20.0000
|
71 |
+
| end of split 9 / 62 | epoch 2 | time: 1589.90s | valid loss 0.9770 | valid ppl 2.6565 | learning rate 20.0000
|
72 |
+
| end of split 10 / 62 | epoch 2 | time: 1586.76s | valid loss 0.9759 | valid ppl 2.6535 | learning rate 20.0000
|
73 |
+
| end of split 11 / 62 | epoch 2 | time: 1583.54s | valid loss 0.9783 | valid ppl 2.6600 | learning rate 20.0000
|
74 |
+
| end of split 12 / 62 | epoch 2 | time: 1585.70s | valid loss 1.0014 | valid ppl 2.7221 | learning rate 20.0000
|
75 |
+
| end of split 13 / 62 | epoch 2 | time: 1585.88s | valid loss 0.9768 | valid ppl 2.6559 | learning rate 20.0000
|
76 |
+
| end of split 14 / 62 | epoch 2 | time: 1587.69s | valid loss 0.9754 | valid ppl 2.6523 | learning rate 20.0000
|
77 |
+
| end of split 15 / 62 | epoch 2 | time: 1586.05s | valid loss 0.9736 | valid ppl 2.6475 | learning rate 20.0000
|
78 |
+
| end of split 16 / 62 | epoch 2 | time: 1589.38s | valid loss 0.9740 | valid ppl 2.6486 | learning rate 20.0000
|
79 |
+
| end of split 17 / 62 | epoch 2 | time: 1591.27s | valid loss 0.9756 | valid ppl 2.6527 | learning rate 20.0000
|
80 |
+
| end of split 18 / 62 | epoch 2 | time: 1590.28s | valid loss 0.9728 | valid ppl 2.6454 | learning rate 20.0000
|
81 |
+
| end of split 19 / 62 | epoch 2 | time: 1588.81s | valid loss 0.9727 | valid ppl 2.6452 | learning rate 20.0000
|
82 |
+
| end of split 20 / 62 | epoch 2 | time: 1590.45s | valid loss 0.9723 | valid ppl 2.6440 | learning rate 20.0000
|
83 |
+
| end of split 21 / 62 | epoch 2 | time: 1587.61s | valid loss 0.9716 | valid ppl 2.6422 | learning rate 20.0000
|
84 |
+
| end of split 22 / 62 | epoch 2 | time: 1587.52s | valid loss 0.9708 | valid ppl 2.6401 | learning rate 20.0000
|
85 |
+
| end of split 23 / 62 | epoch 2 | time: 1587.01s | valid loss 0.9709 | valid ppl 2.6402 | learning rate 20.0000
|
86 |
+
| end of split 24 / 62 | epoch 2 | time: 1587.21s | valid loss 0.9701 | valid ppl 2.6383 | learning rate 20.0000
|
87 |
+
| end of split 25 / 62 | epoch 2 | time: 1585.58s | valid loss 0.9713 | valid ppl 2.6413 | learning rate 20.0000
|
88 |
+
| end of split 26 / 62 | epoch 2 | time: 1582.23s | valid loss 0.9920 | valid ppl 2.6967 | learning rate 20.0000
|
89 |
+
| end of split 27 / 62 | epoch 2 | time: 1584.31s | valid loss 0.9696 | valid ppl 2.6368 | learning rate 20.0000
|
90 |
+
| end of split 28 / 62 | epoch 2 | time: 1583.27s | valid loss 0.9690 | valid ppl 2.6353 | learning rate 20.0000
|
91 |
+
| end of split 29 / 62 | epoch 2 | time: 1583.73s | valid loss 0.9685 | valid ppl 2.6339 | learning rate 20.0000
|
92 |
+
| end of split 30 / 62 | epoch 2 | time: 1582.01s | valid loss 0.9712 | valid ppl 2.6412 | learning rate 20.0000
|
93 |
+
| end of split 31 / 62 | epoch 2 | time: 1577.61s | valid loss 0.9698 | valid ppl 2.6374 | learning rate 20.0000
|
94 |
+
| end of split 32 / 62 | epoch 2 | time: 1576.99s | valid loss 0.9677 | valid ppl 2.6318 | learning rate 20.0000
|
95 |
+
| end of split 33 / 62 | epoch 2 | time: 1576.05s | valid loss 0.9675 | valid ppl 2.6314 | learning rate 20.0000
|
96 |
+
| end of split 34 / 62 | epoch 2 | time: 1580.30s | valid loss 0.9668 | valid ppl 2.6296 | learning rate 20.0000
|
97 |
+
| end of split 35 / 62 | epoch 2 | time: 1580.63s | valid loss 0.9663 | valid ppl 2.6282 | learning rate 20.0000
|
98 |
+
| end of split 36 / 62 | epoch 2 | time: 1581.22s | valid loss 0.9660 | valid ppl 2.6275 | learning rate 20.0000
|
99 |
+
| end of split 37 / 62 | epoch 2 | time: 1581.83s | valid loss 0.9668 | valid ppl 2.6295 | learning rate 20.0000
|
100 |
+
| end of split 38 / 62 | epoch 2 | time: 1583.12s | valid loss 0.9663 | valid ppl 2.6283 | learning rate 20.0000
|
101 |
+
| end of split 39 / 62 | epoch 2 | time: 1584.87s | valid loss 0.9653 | valid ppl 2.6256 | learning rate 20.0000
|
102 |
+
| end of split 40 / 62 | epoch 2 | time: 847.08s | valid loss 0.9723 | valid ppl 2.6440 | learning rate 20.0000
|
103 |
+
| end of split 41 / 62 | epoch 2 | time: 1592.30s | valid loss 0.9707 | valid ppl 2.6398 | learning rate 20.0000
|
104 |
+
| end of split 42 / 62 | epoch 2 | time: 1602.69s | valid loss 0.9655 | valid ppl 2.6262 | learning rate 20.0000
|
105 |
+
| end of split 43 / 62 | epoch 2 | time: 1608.11s | valid loss 0.9649 | valid ppl 2.6245 | learning rate 20.0000
|
106 |
+
| end of split 44 / 62 | epoch 2 | time: 1610.00s | valid loss 0.9641 | valid ppl 2.6225 | learning rate 20.0000
|
107 |
+
| end of split 45 / 62 | epoch 2 | time: 1590.39s | valid loss 1.0062 | valid ppl 2.7352 | learning rate 20.0000
|
108 |
+
| end of split 46 / 62 | epoch 2 | time: 1569.29s | valid loss 1.5219 | valid ppl 4.5807 | learning rate 20.0000
|
109 |
+
| end of split 47 / 62 | epoch 2 | time: 1573.04s | valid loss 1.2816 | valid ppl 3.6023 | learning rate 20.0000
|
110 |
+
| end of split 48 / 62 | epoch 2 | time: 1575.91s | valid loss 1.1161 | valid ppl 3.0529 | learning rate 20.0000
|
111 |
+
| end of split 49 / 62 | epoch 2 | time: 1573.44s | valid loss 1.0870 | valid ppl 2.9653 | learning rate 20.0000
|
112 |
+
| end of split 50 / 62 | epoch 2 | time: 1575.89s | valid loss 1.0426 | valid ppl 2.8367 | learning rate 20.0000
|
113 |
+
| end of split 51 / 62 | epoch 2 | time: 1578.06s | valid loss 1.0085 | valid ppl 2.7415 | learning rate 20.0000
|
114 |
+
| end of split 52 / 62 | epoch 2 | time: 1583.24s | valid loss 0.9898 | valid ppl 2.6907 | learning rate 20.0000
|
115 |
+
| end of split 53 / 62 | epoch 2 | time: 1583.39s | valid loss 0.9789 | valid ppl 2.6617 | learning rate 20.0000
|
116 |
+
| end of split 54 / 62 | epoch 2 | time: 1582.99s | valid loss 0.9752 | valid ppl 2.6516 | learning rate 20.0000
|
117 |
+
| end of split 55 / 62 | epoch 2 | time: 1584.67s | valid loss 0.9727 | valid ppl 2.6450 | learning rate 20.0000
|
118 |
+
| end of split 56 / 62 | epoch 2 | time: 1587.32s | valid loss 0.9680 | valid ppl 2.6327 | learning rate 5.0000
|
119 |
+
| end of split 57 / 62 | epoch 2 | time: 1589.56s | valid loss 0.9671 | valid ppl 2.6303 | learning rate 5.0000
|
120 |
+
| end of split 58 / 62 | epoch 2 | time: 1590.23s | valid loss 0.9665 | valid ppl 2.6286 | learning rate 5.0000
|
121 |
+
| end of split 59 / 62 | epoch 2 | time: 1592.84s | valid loss 0.9658 | valid ppl 2.6270 | learning rate 5.0000
|
122 |
+
| end of split 60 / 62 | epoch 2 | time: 1593.67s | valid loss 0.9652 | valid ppl 2.6253 | learning rate 5.0000
|
123 |
+
| end of split 61 / 62 | epoch 2 | time: 1593.45s | valid loss 0.9671 | valid ppl 2.6303 | learning rate 5.0000
|
124 |
+
| end of split 62 / 62 | epoch 2 | time: 1592.63s | valid loss 0.9642 | valid ppl 2.6228 | learning rate 5.0000
|
125 |
+
| end of split 1 / 62 | epoch 3 | time: 1588.48s | valid loss 0.9639 | valid ppl 2.6219 | learning rate 5.0000
|
126 |
+
| end of split 2 / 62 | epoch 3 | time: 1595.00s | valid loss 0.9635 | valid ppl 2.6208 | learning rate 5.0000
|
127 |
+
| end of split 3 / 62 | epoch 3 | time: 1592.33s | valid loss 0.9631 | valid ppl 2.6197 | learning rate 5.0000
|
128 |
+
| end of split 4 / 62 | epoch 3 | time: 1592.28s | valid loss 0.9630 | valid ppl 2.6194 | learning rate 5.0000
|
129 |
+
| end of split 5 / 62 | epoch 3 | time: 1592.85s | valid loss 0.9626 | valid ppl 2.6184 | learning rate 5.0000
|
130 |
+
| end of split 6 / 62 | epoch 3 | time: 1592.84s | valid loss 0.9622 | valid ppl 2.6173 | learning rate 5.0000
|
131 |
+
| end of split 7 / 62 | epoch 3 | time: 1592.00s | valid loss 0.9619 | valid ppl 2.6167 | learning rate 5.0000
|
132 |
+
| end of split 8 / 62 | epoch 3 | time: 1593.04s | valid loss 0.9616 | valid ppl 2.6159 | learning rate 5.0000
|
133 |
+
| end of split 9 / 62 | epoch 3 | time: 1592.29s | valid loss 0.9615 | valid ppl 2.6155 | learning rate 5.0000
|
134 |
+
| end of split 10 / 62 | epoch 3 | time: 1590.81s | valid loss 0.9612 | valid ppl 2.6149 | learning rate 5.0000
|
135 |
+
| end of split 11 / 62 | epoch 3 | time: 1591.61s | valid loss 0.9611 | valid ppl 2.6146 | learning rate 5.0000
|
136 |
+
| end of split 12 / 62 | epoch 3 | time: 1590.51s | valid loss 0.9609 | valid ppl 2.6141 | learning rate 5.0000
|
137 |
+
| end of split 13 / 62 | epoch 3 | time: 1590.78s | valid loss 0.9604 | valid ppl 2.6127 | learning rate 5.0000
|
138 |
+
| end of split 14 / 62 | epoch 3 | time: 1589.97s | valid loss 0.9604 | valid ppl 2.6126 | learning rate 5.0000
|
139 |
+
| end of split 15 / 62 | epoch 3 | time: 1589.70s | valid loss 0.9600 | valid ppl 2.6117 | learning rate 5.0000
|
140 |
+
| end of split 16 / 62 | epoch 3 | time: 1589.05s | valid loss 0.9600 | valid ppl 2.6118 | learning rate 5.0000
|
141 |
+
| end of split 17 / 62 | epoch 3 | time: 1589.99s | valid loss 0.9596 | valid ppl 2.6107 | learning rate 5.0000
|
142 |
+
| end of split 18 / 62 | epoch 3 | time: 1590.63s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 5.0000
|
143 |
+
| end of split 19 / 62 | epoch 3 | time: 1588.73s | valid loss 0.9593 | valid ppl 2.6099 | learning rate 5.0000
|
144 |
+
| end of split 20 / 62 | epoch 3 | time: 1589.71s | valid loss 0.9589 | valid ppl 2.6088 | learning rate 5.0000
|
145 |
+
| end of split 21 / 62 | epoch 3 | time: 1589.46s | valid loss 0.9588 | valid ppl 2.6086 | learning rate 5.0000
|
146 |
+
| end of split 22 / 62 | epoch 3 | time: 1589.12s | valid loss 0.9586 | valid ppl 2.6080 | learning rate 5.0000
|
147 |
+
| end of split 23 / 62 | epoch 3 | time: 1591.71s | valid loss 0.9589 | valid ppl 2.6088 | learning rate 5.0000
|
148 |
+
| end of split 24 / 62 | epoch 3 | time: 1589.39s | valid loss 0.9582 | valid ppl 2.6070 | learning rate 5.0000
|
149 |
+
| end of split 25 / 62 | epoch 3 | time: 1590.33s | valid loss 0.9582 | valid ppl 2.6070 | learning rate 5.0000
|
150 |
+
| end of split 26 / 62 | epoch 3 | time: 1589.33s | valid loss 0.9580 | valid ppl 2.6065 | learning rate 5.0000
|
151 |
+
| end of split 27 / 62 | epoch 3 | time: 1589.70s | valid loss 0.9580 | valid ppl 2.6066 | learning rate 5.0000
|
152 |
+
| end of split 28 / 62 | epoch 3 | time: 1589.72s | valid loss 0.9578 | valid ppl 2.6060 | learning rate 5.0000
|
153 |
+
| end of split 29 / 62 | epoch 3 | time: 849.01s | valid loss 0.9583 | valid ppl 2.6072 | learning rate 5.0000
|
154 |
+
| end of split 30 / 62 | epoch 3 | time: 1592.01s | valid loss 0.9576 | valid ppl 2.6055 | learning rate 5.0000
|
155 |
+
| end of split 31 / 62 | epoch 3 | time: 1593.91s | valid loss 0.9574 | valid ppl 2.6048 | learning rate 5.0000
|
156 |
+
| end of split 32 / 62 | epoch 3 | time: 1593.53s | valid loss 0.9573 | valid ppl 2.6047 | learning rate 5.0000
|
157 |
+
| end of split 33 / 62 | epoch 3 | time: 1593.28s | valid loss 0.9573 | valid ppl 2.6047 | learning rate 5.0000
|
158 |
+
| end of split 34 / 62 | epoch 3 | time: 1592.56s | valid loss 0.9571 | valid ppl 2.6040 | learning rate 5.0000
|
159 |
+
| end of split 35 / 62 | epoch 3 | time: 1594.00s | valid loss 0.9569 | valid ppl 2.6037 | learning rate 5.0000
|
160 |
+
| end of split 36 / 62 | epoch 3 | time: 1592.16s | valid loss 0.9580 | valid ppl 2.6064 | learning rate 5.0000
|
161 |
+
| end of split 37 / 62 | epoch 3 | time: 1593.97s | valid loss 0.9569 | valid ppl 2.6037 | learning rate 5.0000
|
162 |
+
| end of split 38 / 62 | epoch 3 | time: 1595.62s | valid loss 0.9566 | valid ppl 2.6029 | learning rate 5.0000
|
163 |
+
| end of split 39 / 62 | epoch 3 | time: 1595.26s | valid loss 0.9565 | valid ppl 2.6025 | learning rate 5.0000
|
164 |
+
| end of split 40 / 62 | epoch 3 | time: 1595.91s | valid loss 0.9565 | valid ppl 2.6025 | learning rate 5.0000
|
165 |
+
| end of split 41 / 62 | epoch 3 | time: 1597.34s | valid loss 0.9562 | valid ppl 2.6019 | learning rate 5.0000
|
166 |
+
| end of split 42 / 62 | epoch 3 | time: 1600.88s | valid loss 0.9561 | valid ppl 2.6015 | learning rate 5.0000
|
167 |
+
| end of split 43 / 62 | epoch 3 | time: 1601.74s | valid loss 0.9559 | valid ppl 2.6010 | learning rate 5.0000
|
168 |
+
| end of split 44 / 62 | epoch 3 | time: 1603.40s | valid loss 0.9562 | valid ppl 2.6018 | learning rate 5.0000
|
169 |
+
| end of split 45 / 62 | epoch 3 | time: 1601.88s | valid loss 0.9557 | valid ppl 2.6004 | learning rate 5.0000
|
170 |
+
| end of split 46 / 62 | epoch 3 | time: 1602.03s | valid loss 0.9556 | valid ppl 2.6002 | learning rate 5.0000
|
171 |
+
| end of split 47 / 62 | epoch 3 | time: 1601.98s | valid loss 0.9555 | valid ppl 2.5999 | learning rate 5.0000
|
172 |
+
| end of split 48 / 62 | epoch 3 | time: 1603.86s | valid loss 0.9555 | valid ppl 2.6001 | learning rate 5.0000
|
173 |
+
| end of split 49 / 62 | epoch 3 | time: 1600.52s | valid loss 0.9556 | valid ppl 2.6002 | learning rate 5.0000
|
174 |
+
| end of split 50 / 62 | epoch 3 | time: 1597.63s | valid loss 0.9549 | valid ppl 2.5985 | learning rate 5.0000
|
175 |
+
| end of split 51 / 62 | epoch 3 | time: 1600.65s | valid loss 0.9550 | valid ppl 2.5987 | learning rate 5.0000
|
176 |
+
| end of split 52 / 62 | epoch 3 | time: 1599.09s | valid loss 0.9549 | valid ppl 2.5984 | learning rate 5.0000
|
177 |
+
| end of split 53 / 62 | epoch 3 | time: 1599.84s | valid loss 0.9549 | valid ppl 2.5983 | learning rate 5.0000
|
178 |
+
| end of split 54 / 62 | epoch 3 | time: 1597.92s | valid loss 0.9547 | valid ppl 2.5980 | learning rate 5.0000
|
179 |
+
| end of split 55 / 62 | epoch 3 | time: 1598.06s | valid loss 0.9546 | valid ppl 2.5976 | learning rate 5.0000
|
180 |
+
| end of split 56 / 62 | epoch 3 | time: 1597.08s | valid loss 0.9544 | valid ppl 2.5970 | learning rate 5.0000
|
181 |
+
| end of split 57 / 62 | epoch 3 | time: 1596.42s | valid loss 0.9544 | valid ppl 2.5971 | learning rate 5.0000
|
182 |
+
| end of split 58 / 62 | epoch 3 | time: 1597.40s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 5.0000
|
183 |
+
| end of split 59 / 62 | epoch 3 | time: 1596.76s | valid loss 0.9539 | valid ppl 2.5959 | learning rate 5.0000
|
184 |
+
| end of split 60 / 62 | epoch 3 | time: 1594.38s | valid loss 0.9540 | valid ppl 2.5962 | learning rate 5.0000
|
185 |
+
| end of split 61 / 62 | epoch 3 | time: 1595.01s | valid loss 0.9550 | valid ppl 2.5988 | learning rate 5.0000
|
186 |
+
| end of split 62 / 62 | epoch 3 | time: 1596.06s | valid loss 0.9541 | valid ppl 2.5963 | learning rate 5.0000
|
187 |
+
| end of split 1 / 62 | epoch 4 | time: 1590.51s | valid loss 0.9539 | valid ppl 2.5959 | learning rate 5.0000
|
188 |
+
| end of split 2 / 62 | epoch 4 | time: 1594.92s | valid loss 0.9538 | valid ppl 2.5955 | learning rate 5.0000
|
189 |
+
| end of split 3 / 62 | epoch 4 | time: 1594.53s | valid loss 0.9536 | valid ppl 2.5950 | learning rate 5.0000
|
190 |
+
| end of split 4 / 62 | epoch 4 | time: 1595.50s | valid loss 0.9534 | valid ppl 2.5946 | learning rate 5.0000
|
191 |
+
| end of split 5 / 62 | epoch 4 | time: 1594.79s | valid loss 0.9535 | valid ppl 2.5947 | learning rate 5.0000
|
192 |
+
| end of split 6 / 62 | epoch 4 | time: 1595.23s | valid loss 0.9535 | valid ppl 2.5948 | learning rate 5.0000
|
193 |
+
| end of split 7 / 62 | epoch 4 | time: 1594.51s | valid loss 0.9535 | valid ppl 2.5948 | learning rate 5.0000
|
194 |
+
| end of split 8 / 62 | epoch 4 | time: 1595.67s | valid loss 0.9531 | valid ppl 2.5938 | learning rate 5.0000
|
195 |
+
| end of split 9 / 62 | epoch 4 | time: 1594.19s | valid loss 0.9533 | valid ppl 2.5942 | learning rate 5.0000
|
196 |
+
| end of split 10 / 62 | epoch 4 | time: 1596.43s | valid loss 0.9530 | valid ppl 2.5935 | learning rate 5.0000
|
197 |
+
| end of split 11 / 62 | epoch 4 | time: 1594.75s | valid loss 0.9533 | valid ppl 2.5944 | learning rate 5.0000
|
198 |
+
| end of split 12 / 62 | epoch 4 | time: 1593.83s | valid loss 0.9530 | valid ppl 2.5934 | learning rate 5.0000
|
199 |
+
| end of split 13 / 62 | epoch 4 | time: 1593.87s | valid loss 0.9530 | valid ppl 2.5934 | learning rate 5.0000
|
200 |
+
| end of split 14 / 62 | epoch 4 | time: 1595.57s | valid loss 0.9529 | valid ppl 2.5933 | learning rate 5.0000
|
201 |
+
| end of split 15 / 62 | epoch 4 | time: 1597.27s | valid loss 0.9527 | valid ppl 2.5927 | learning rate 5.0000
|
202 |
+
| end of split 16 / 62 | epoch 4 | time: 1594.24s | valid loss 0.9526 | valid ppl 2.5924 | learning rate 5.0000
|
203 |
+
| end of split 17 / 62 | epoch 4 | time: 1594.23s | valid loss 0.9527 | valid ppl 2.5927 | learning rate 5.0000
|
204 |
+
| end of split 18 / 62 | epoch 4 | time: 1595.12s | valid loss 0.9524 | valid ppl 2.5918 | learning rate 5.0000
|
205 |
+
| end of split 19 / 62 | epoch 4 | time: 1595.95s | valid loss 0.9524 | valid ppl 2.5920 | learning rate 5.0000
|
206 |
+
| end of split 20 / 62 | epoch 4 | time: 1594.70s | valid loss 0.9522 | valid ppl 2.5913 | learning rate 5.0000
|
207 |
+
| end of split 21 / 62 | epoch 4 | time: 1594.57s | valid loss 0.9520 | valid ppl 2.5908 | learning rate 5.0000
|
208 |
+
| end of split 22 / 62 | epoch 4 | time: 1594.91s | valid loss 0.9520 | valid ppl 2.5908 | learning rate 5.0000
|
209 |
+
| end of split 23 / 62 | epoch 4 | time: 1594.17s | valid loss 0.9519 | valid ppl 2.5906 | learning rate 5.0000
|
210 |
+
| end of split 24 / 62 | epoch 4 | time: 1593.85s | valid loss 0.9519 | valid ppl 2.5906 | learning rate 5.0000
|
211 |
+
| end of split 25 / 62 | epoch 4 | time: 1594.37s | valid loss 0.9519 | valid ppl 2.5907 | learning rate 5.0000
|
212 |
+
| end of split 26 / 62 | epoch 4 | time: 1595.05s | valid loss 0.9516 | valid ppl 2.5898 | learning rate 5.0000
|
213 |
+
| end of split 27 / 62 | epoch 4 | time: 1596.66s | valid loss 0.9516 | valid ppl 2.5898 | learning rate 5.0000
|
214 |
+
| end of split 28 / 62 | epoch 4 | time: 1597.62s | valid loss 0.9522 | valid ppl 2.5915 | learning rate 5.0000
|
215 |
+
| end of split 29 / 62 | epoch 4 | time: 1596.01s | valid loss 0.9514 | valid ppl 2.5893 | learning rate 5.0000
|
216 |
+
| end of split 30 / 62 | epoch 4 | time: 1596.94s | valid loss 0.9514 | valid ppl 2.5895 | learning rate 5.0000
|
217 |
+
| end of split 31 / 62 | epoch 4 | time: 1596.59s | valid loss 0.9515 | valid ppl 2.5895 | learning rate 5.0000
|
218 |
+
| end of split 32 / 62 | epoch 4 | time: 1594.91s | valid loss 0.9513 | valid ppl 2.5892 | learning rate 5.0000
|
219 |
+
| end of split 33 / 62 | epoch 4 | time: 1596.39s | valid loss 0.9512 | valid ppl 2.5888 | learning rate 5.0000
|
220 |
+
| end of split 34 / 62 | epoch 4 | time: 1596.82s | valid loss 0.9512 | valid ppl 2.5888 | learning rate 5.0000
|
221 |
+
| end of split 35 / 62 | epoch 4 | time: 1597.66s | valid loss 0.9511 | valid ppl 2.5886 | learning rate 5.0000
|
222 |
+
| end of split 36 / 62 | epoch 4 | time: 1598.20s | valid loss 0.9516 | valid ppl 2.5899 | learning rate 5.0000
|
223 |
+
| end of split 37 / 62 | epoch 4 | time: 1598.02s | valid loss 0.9510 | valid ppl 2.5883 | learning rate 5.0000
|
224 |
+
| end of split 38 / 62 | epoch 4 | time: 1597.10s | valid loss 0.9509 | valid ppl 2.5881 | learning rate 5.0000
|
225 |
+
| end of split 39 / 62 | epoch 4 | time: 1599.56s | valid loss 0.9509 | valid ppl 2.5879 | learning rate 5.0000
|
226 |
+
| end of split 40 / 62 | epoch 4 | time: 1597.81s | valid loss 0.9510 | valid ppl 2.5882 | learning rate 5.0000
|
227 |
+
| end of split 41 / 62 | epoch 4 | time: 1598.85s | valid loss 0.9507 | valid ppl 2.5876 | learning rate 5.0000
|
228 |
+
| end of split 42 / 62 | epoch 4 | time: 1597.13s | valid loss 0.9507 | valid ppl 2.5875 | learning rate 5.0000
|
229 |
+
| end of split 43 / 62 | epoch 4 | time: 1598.31s | valid loss 0.9508 | valid ppl 2.5877 | learning rate 5.0000
|
230 |
+
| end of split 44 / 62 | epoch 4 | time: 1597.29s | valid loss 0.9507 | valid ppl 2.5874 | learning rate 5.0000
|
231 |
+
| end of split 45 / 62 | epoch 4 | time: 1595.76s | valid loss 0.9508 | valid ppl 2.5877 | learning rate 5.0000
|
232 |
+
| end of split 46 / 62 | epoch 4 | time: 1597.26s | valid loss 0.9506 | valid ppl 2.5872 | learning rate 5.0000
|
233 |
+
| end of split 47 / 62 | epoch 4 | time: 1596.63s | valid loss 0.9504 | valid ppl 2.5868 | learning rate 5.0000
|
234 |
+
| end of split 48 / 62 | epoch 4 | time: 1597.06s | valid loss 0.9503 | valid ppl 2.5866 | learning rate 5.0000
|
235 |
+
| end of split 49 / 62 | epoch 4 | time: 1596.32s | valid loss 0.9501 | valid ppl 2.5860 | learning rate 5.0000
|
236 |
+
| end of split 50 / 62 | epoch 4 | time: 852.39s | valid loss 0.9507 | valid ppl 2.5876 | learning rate 5.0000
|
237 |
+
| end of split 51 / 62 | epoch 4 | time: 1596.92s | valid loss 0.9500 | valid ppl 2.5857 | learning rate 5.0000
|
238 |
+
| end of split 52 / 62 | epoch 4 | time: 1595.75s | valid loss 0.9505 | valid ppl 2.5869 | learning rate 5.0000
|
239 |
+
| end of split 53 / 62 | epoch 4 | time: 1593.59s | valid loss 0.9501 | valid ppl 2.5858 | learning rate 5.0000
|
240 |
+
| end of split 54 / 62 | epoch 4 | time: 1594.38s | valid loss 0.9509 | valid ppl 2.5881 | learning rate 5.0000
|
241 |
+
| end of split 55 / 62 | epoch 4 | time: 1593.89s | valid loss 0.9496 | valid ppl 2.5848 | learning rate 5.0000
|
242 |
+
| end of split 56 / 62 | epoch 4 | time: 1593.86s | valid loss 0.9499 | valid ppl 2.5854 | learning rate 5.0000
|
243 |
+
| end of split 57 / 62 | epoch 4 | time: 1592.65s | valid loss 0.9496 | valid ppl 2.5846 | learning rate 5.0000
|
244 |
+
| end of split 58 / 62 | epoch 4 | time: 1593.43s | valid loss 0.9497 | valid ppl 2.5850 | learning rate 5.0000
|
245 |
+
| end of split 59 / 62 | epoch 4 | time: 1590.22s | valid loss 0.9496 | valid ppl 2.5846 | learning rate 5.0000
|
246 |
+
| end of split 60 / 62 | epoch 4 | time: 1592.59s | valid loss 0.9494 | valid ppl 2.5840 | learning rate 5.0000
|
247 |
+
| end of split 61 / 62 | epoch 4 | time: 1590.49s | valid loss 0.9494 | valid ppl 2.5842 | learning rate 5.0000
|
248 |
+
| end of split 62 / 62 | epoch 4 | time: 1592.95s | valid loss 0.9494 | valid ppl 2.5841 | learning rate 5.0000
|
249 |
+
| end of split 1 / 62 | epoch 5 | time: 1588.63s | valid loss 0.9495 | valid ppl 2.5845 | learning rate 5.0000
|
250 |
+
| end of split 2 / 62 | epoch 5 | time: 1594.59s | valid loss 0.9492 | valid ppl 2.5837 | learning rate 5.0000
|
251 |
+
| end of split 3 / 62 | epoch 5 | time: 1595.14s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
|
252 |
+
| end of split 4 / 62 | epoch 5 | time: 1593.00s | valid loss 0.9491 | valid ppl 2.5833 | learning rate 5.0000
|
253 |
+
| end of split 5 / 62 | epoch 5 | time: 1592.16s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
|
254 |
+
| end of split 6 / 62 | epoch 5 | time: 1592.38s | valid loss 0.9491 | valid ppl 2.5833 | learning rate 5.0000
|
255 |
+
| end of split 7 / 62 | epoch 5 | time: 1593.78s | valid loss 0.9490 | valid ppl 2.5832 | learning rate 5.0000
|
256 |
+
| end of split 8 / 62 | epoch 5 | time: 1594.50s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
|
257 |
+
| end of split 9 / 62 | epoch 5 | time: 1594.20s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
|
258 |
+
| end of split 10 / 62 | epoch 5 | time: 1594.41s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
|
259 |
+
| end of split 11 / 62 | epoch 5 | time: 1592.91s | valid loss 0.9489 | valid ppl 2.5829 | learning rate 5.0000
|
260 |
+
| end of split 12 / 62 | epoch 5 | time: 1595.00s | valid loss 0.9494 | valid ppl 2.5842 | learning rate 5.0000
|
261 |
+
| end of split 13 / 62 | epoch 5 | time: 1592.84s | valid loss 0.9486 | valid ppl 2.5822 | learning rate 5.0000
|
262 |
+
| end of split 14 / 62 | epoch 5 | time: 1593.26s | valid loss 0.9485 | valid ppl 2.5819 | learning rate 5.0000
|
263 |
+
| end of split 15 / 62 | epoch 5 | time: 1592.76s | valid loss 0.9486 | valid ppl 2.5822 | learning rate 5.0000
|
264 |
+
| end of split 16 / 62 | epoch 5 | time: 1595.66s | valid loss 0.9483 | valid ppl 2.5814 | learning rate 5.0000
|
265 |
+
| end of split 17 / 62 | epoch 5 | time: 1596.12s | valid loss 0.9484 | valid ppl 2.5816 | learning rate 5.0000
|
266 |
+
| end of split 18 / 62 | epoch 5 | time: 1597.15s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
|
267 |
+
| end of split 19 / 62 | epoch 5 | time: 1595.50s | valid loss 0.9487 | valid ppl 2.5824 | learning rate 5.0000
|
268 |
+
| end of split 20 / 62 | epoch 5 | time: 1597.42s | valid loss 0.9482 | valid ppl 2.5812 | learning rate 5.0000
|
269 |
+
| end of split 21 / 62 | epoch 5 | time: 1596.20s | valid loss 0.9483 | valid ppl 2.5814 | learning rate 5.0000
|
270 |
+
| end of split 22 / 62 | epoch 5 | time: 1597.06s | valid loss 0.9479 | valid ppl 2.5804 | learning rate 5.0000
|
271 |
+
| end of split 23 / 62 | epoch 5 | time: 1596.92s | valid loss 0.9479 | valid ppl 2.5803 | learning rate 5.0000
|
272 |
+
| end of split 24 / 62 | epoch 5 | time: 1593.52s | valid loss 0.9481 | valid ppl 2.5807 | learning rate 5.0000
|
273 |
+
| end of split 25 / 62 | epoch 5 | time: 1595.12s | valid loss 0.9480 | valid ppl 2.5805 | learning rate 5.0000
|
274 |
+
| end of split 26 / 62 | epoch 5 | time: 1595.25s | valid loss 0.9479 | valid ppl 2.5802 | learning rate 5.0000
|
275 |
+
| end of split 27 / 62 | epoch 5 | time: 1644.92s | valid loss 0.9477 | valid ppl 2.5799 | learning rate 5.0000
|
276 |
+
| end of split 28 / 62 | epoch 5 | time: 1595.94s | valid loss 0.9478 | valid ppl 2.5801 | learning rate 5.0000
|
277 |
+
| end of split 29 / 62 | epoch 5 | time: 1596.39s | valid loss 0.9489 | valid ppl 2.5830 | learning rate 5.0000
|
278 |
+
| end of split 30 / 62 | epoch 5 | time: 1596.48s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
|
279 |
+
| end of split 31 / 62 | epoch 5 | time: 1594.94s | valid loss 0.9480 | valid ppl 2.5805 | learning rate 5.0000
|
280 |
+
| end of split 32 / 62 | epoch 5 | time: 1596.25s | valid loss 0.9477 | valid ppl 2.5799 | learning rate 5.0000
|
281 |
+
| end of split 33 / 62 | epoch 5 | time: 1595.95s | valid loss 0.9476 | valid ppl 2.5795 | learning rate 5.0000
|
282 |
+
| end of split 34 / 62 | epoch 5 | time: 1594.31s | valid loss 0.9474 | valid ppl 2.5791 | learning rate 5.0000
|
283 |
+
| end of split 35 / 62 | epoch 5 | time: 1595.73s | valid loss 0.9475 | valid ppl 2.5792 | learning rate 5.0000
|
284 |
+
| end of split 36 / 62 | epoch 5 | time: 1593.93s | valid loss 0.9476 | valid ppl 2.5794 | learning rate 5.0000
|
285 |
+
| end of split 37 / 62 | epoch 5 | time: 1594.50s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
|
286 |
+
| end of split 38 / 62 | epoch 5 | time: 1592.84s | valid loss 0.9474 | valid ppl 2.5790 | learning rate 5.0000
|
287 |
+
| end of split 39 / 62 | epoch 5 | time: 1591.33s | valid loss 0.9473 | valid ppl 2.5788 | learning rate 5.0000
|
288 |
+
| end of split 40 / 62 | epoch 5 | time: 1590.07s | valid loss 0.9471 | valid ppl 2.5783 | learning rate 5.0000
|
289 |
+
| end of split 41 / 62 | epoch 5 | time: 1591.27s | valid loss 0.9474 | valid ppl 2.5791 | learning rate 5.0000
|
290 |
+
| end of split 42 / 62 | epoch 5 | time: 1590.29s | valid loss 0.9471 | valid ppl 2.5782 | learning rate 5.0000
|
291 |
+
| end of split 43 / 62 | epoch 5 | time: 1590.07s | valid loss 0.9470 | valid ppl 2.5780 | learning rate 5.0000
|
292 |
+
| end of split 44 / 62 | epoch 5 | time: 1590.49s | valid loss 0.9471 | valid ppl 2.5781 | learning rate 5.0000
|
293 |
+
| end of split 45 / 62 | epoch 5 | time: 1589.80s | valid loss 0.9473 | valid ppl 2.5787 | learning rate 5.0000
|
294 |
+
| end of split 46 / 62 | epoch 5 | time: 1588.77s | valid loss 0.9470 | valid ppl 2.5779 | learning rate 5.0000
|
295 |
+
| end of split 47 / 62 | epoch 5 | time: 1589.22s | valid loss 0.9468 | valid ppl 2.5773 | learning rate 5.0000
|
296 |
+
| end of split 48 / 62 | epoch 5 | time: 1590.14s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
|
297 |
+
| end of split 49 / 62 | epoch 5 | time: 1587.40s | valid loss 0.9468 | valid ppl 2.5775 | learning rate 5.0000
|
298 |
+
| end of split 50 / 62 | epoch 5 | time: 847.83s | valid loss 0.9472 | valid ppl 2.5786 | learning rate 5.0000
|
299 |
+
| end of split 51 / 62 | epoch 5 | time: 1588.35s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
|
300 |
+
| end of split 52 / 62 | epoch 5 | time: 1587.80s | valid loss 0.9468 | valid ppl 2.5774 | learning rate 5.0000
|
301 |
+
| end of split 53 / 62 | epoch 5 | time: 1588.01s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
|
302 |
+
| end of split 54 / 62 | epoch 5 | time: 1585.93s | valid loss 0.9465 | valid ppl 2.5767 | learning rate 5.0000
|
303 |
+
| end of split 55 / 62 | epoch 5 | time: 1584.78s | valid loss 0.9463 | valid ppl 2.5763 | learning rate 5.0000
|
304 |
+
| end of split 56 / 62 | epoch 5 | time: 1585.77s | valid loss 0.9481 | valid ppl 2.5808 | learning rate 5.0000
|
305 |
+
| end of split 57 / 62 | epoch 5 | time: 1586.16s | valid loss 0.9465 | valid ppl 2.5766 | learning rate 5.0000
|
306 |
+
| end of split 58 / 62 | epoch 5 | time: 1586.35s | valid loss 0.9464 | valid ppl 2.5765 | learning rate 5.0000
|
307 |
+
| end of split 59 / 62 | epoch 5 | time: 1585.15s | valid loss 0.9463 | valid ppl 2.5762 | learning rate 5.0000
|
308 |
+
| end of split 60 / 62 | epoch 5 | time: 1585.41s | valid loss 0.9473 | valid ppl 2.5788 | learning rate 5.0000
|
309 |
+
| end of split 61 / 62 | epoch 5 | time: 1586.84s | valid loss 0.9462 | valid ppl 2.5760 | learning rate 5.0000
|
310 |
+
| end of split 62 / 62 | epoch 5 | time: 1585.85s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
|
311 |
+
| end of split 1 / 62 | epoch 6 | time: 1580.81s | valid loss 0.9461 | valid ppl 2.5755 | learning rate 5.0000
|
312 |
+
| end of split 2 / 62 | epoch 6 | time: 1585.96s | valid loss 0.9460 | valid ppl 2.5753 | learning rate 5.0000
|
313 |
+
| end of split 3 / 62 | epoch 6 | time: 1586.43s | valid loss 0.9461 | valid ppl 2.5757 | learning rate 5.0000
|
314 |
+
| end of split 4 / 62 | epoch 6 | time: 1591.11s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
|
315 |
+
| end of split 5 / 62 | epoch 6 | time: 1593.60s | valid loss 0.9458 | valid ppl 2.5749 | learning rate 5.0000
|
316 |
+
| end of split 6 / 62 | epoch 6 | time: 1594.82s | valid loss 0.9459 | valid ppl 2.5751 | learning rate 5.0000
|
317 |
+
| end of split 7 / 62 | epoch 6 | time: 1599.91s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
|
318 |
+
| end of split 8 / 62 | epoch 6 | time: 1601.71s | valid loss 0.9460 | valid ppl 2.5754 | learning rate 5.0000
|
319 |
+
| end of split 9 / 62 | epoch 6 | time: 1597.62s | valid loss 0.9458 | valid ppl 2.5747 | learning rate 5.0000
|
320 |
+
| end of split 10 / 62 | epoch 6 | time: 1600.06s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
|
321 |
+
| end of split 11 / 62 | epoch 6 | time: 1596.53s | valid loss 0.9455 | valid ppl 2.5740 | learning rate 5.0000
|
322 |
+
| end of split 12 / 62 | epoch 6 | time: 1599.04s | valid loss 0.9456 | valid ppl 2.5745 | learning rate 5.0000
|
323 |
+
| end of split 13 / 62 | epoch 6 | time: 1593.55s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
|
324 |
+
| end of split 14 / 62 | epoch 6 | time: 1596.25s | valid loss 0.9454 | valid ppl 2.5740 | learning rate 5.0000
|
325 |
+
| end of split 15 / 62 | epoch 6 | time: 1595.15s | valid loss 0.9454 | valid ppl 2.5740 | learning rate 5.0000
|
326 |
+
| end of split 16 / 62 | epoch 6 | time: 1595.84s | valid loss 0.9454 | valid ppl 2.5738 | learning rate 5.0000
|
327 |
+
| end of split 17 / 62 | epoch 6 | time: 1597.05s | valid loss 0.9453 | valid ppl 2.5737 | learning rate 5.0000
|
328 |
+
| end of split 18 / 62 | epoch 6 | time: 1595.68s | valid loss 0.9469 | valid ppl 2.5776 | learning rate 5.0000
|
329 |
+
| end of split 19 / 62 | epoch 6 | time: 1595.81s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
|
330 |
+
| end of split 20 / 62 | epoch 6 | time: 1596.74s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
|
331 |
+
| end of split 21 / 62 | epoch 6 | time: 1596.50s | valid loss 0.9452 | valid ppl 2.5734 | learning rate 5.0000
|
332 |
+
| end of split 22 / 62 | epoch 6 | time: 1596.57s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
|
333 |
+
| end of split 23 / 62 | epoch 6 | time: 1597.51s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
|
334 |
+
| end of split 24 / 62 | epoch 6 | time: 1597.85s | valid loss 0.9453 | valid ppl 2.5735 | learning rate 5.0000
|
335 |
+
| end of split 25 / 62 | epoch 6 | time: 1595.58s | valid loss 0.9452 | valid ppl 2.5733 | learning rate 5.0000
|
336 |
+
| end of split 26 / 62 | epoch 6 | time: 1599.43s | valid loss 0.9450 | valid ppl 2.5729 | learning rate 5.0000
|
337 |
+
| end of split 27 / 62 | epoch 6 | time: 1625.16s | valid loss 0.9454 | valid ppl 2.5737 | learning rate 5.0000
|
338 |
+
| end of split 28 / 62 | epoch 6 | time: 1677.11s | valid loss 0.9456 | valid ppl 2.5744 | learning rate 5.0000
|
339 |
+
| end of split 29 / 62 | epoch 6 | time: 1664.87s | valid loss 0.9500 | valid ppl 2.5857 | learning rate 5.0000
|
340 |
+
| end of split 30 / 62 | epoch 6 | time: 1610.42s | valid loss 0.9491 | valid ppl 2.5834 | learning rate 5.0000
|
341 |
+
| end of split 31 / 62 | epoch 6 | time: 1613.54s | valid loss 0.9478 | valid ppl 2.5800 | learning rate 5.0000
|
342 |
+
| end of split 32 / 62 | epoch 6 | time: 1616.62s | valid loss 0.9463 | valid ppl 2.5762 | learning rate 5.0000
|
343 |
+
| end of split 33 / 62 | epoch 6 | time: 1619.63s | valid loss 0.9454 | valid ppl 2.5739 | learning rate 5.0000
|
344 |
+
| end of split 34 / 62 | epoch 6 | time: 1617.77s | valid loss 0.9452 | valid ppl 2.5735 | learning rate 5.0000
|
345 |
+
| end of split 35 / 62 | epoch 6 | time: 1616.49s | valid loss 0.9447 | valid ppl 2.5720 | learning rate 1.2500
|
346 |
+
| end of split 36 / 62 | epoch 6 | time: 1617.61s | valid loss 0.9443 | valid ppl 2.5711 | learning rate 1.2500
|
347 |
+
| end of split 37 / 62 | epoch 6 | time: 1619.28s | valid loss 0.9440 | valid ppl 2.5703 | learning rate 1.2500
|
348 |
+
| end of split 38 / 62 | epoch 6 | time: 1620.03s | valid loss 0.9439 | valid ppl 2.5700 | learning rate 1.2500
|
349 |
+
| end of split 39 / 62 | epoch 6 | time: 1621.32s | valid loss 0.9438 | valid ppl 2.5698 | learning rate 1.2500
|
350 |
+
| end of split 40 / 62 | epoch 6 | time: 1625.63s | valid loss 0.9437 | valid ppl 2.5695 | learning rate 1.2500
|
351 |
+
| end of split 41 / 62 | epoch 6 | time: 1625.86s | valid loss 0.9437 | valid ppl 2.5696 | learning rate 1.2500
|
352 |
+
| end of split 42 / 62 | epoch 6 | time: 1625.70s | valid loss 0.9436 | valid ppl 2.5692 | learning rate 1.2500
|
353 |
+
| end of split 43 / 62 | epoch 6 | time: 1629.22s | valid loss 0.9436 | valid ppl 2.5691 | learning rate 1.2500
|
354 |
+
| end of split 44 / 62 | epoch 6 | time: 1628.58s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
|
355 |
+
| end of split 45 / 62 | epoch 6 | time: 870.27s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
|
356 |
+
| end of split 46 / 62 | epoch 6 | time: 1629.99s | valid loss 0.9434 | valid ppl 2.5688 | learning rate 1.2500
|
357 |
+
| end of split 47 / 62 | epoch 6 | time: 1629.90s | valid loss 0.9435 | valid ppl 2.5689 | learning rate 1.2500
|
358 |
+
| end of split 48 / 62 | epoch 6 | time: 1628.52s | valid loss 0.9435 | valid ppl 2.5690 | learning rate 1.2500
|
359 |
+
| end of split 49 / 62 | epoch 6 | time: 1631.93s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
|
360 |
+
| end of split 50 / 62 | epoch 6 | time: 1627.56s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
|
361 |
+
| end of split 51 / 62 | epoch 6 | time: 1628.79s | valid loss 0.9433 | valid ppl 2.5683 | learning rate 1.2500
|
362 |
+
| end of split 52 / 62 | epoch 6 | time: 1630.13s | valid loss 0.9434 | valid ppl 2.5686 | learning rate 1.2500
|
363 |
+
| end of split 53 / 62 | epoch 6 | time: 1630.48s | valid loss 0.9433 | valid ppl 2.5685 | learning rate 1.2500
|
364 |
+
| end of split 54 / 62 | epoch 6 | time: 1629.97s | valid loss 0.9432 | valid ppl 2.5681 | learning rate 1.2500
|
365 |
+
| end of split 55 / 62 | epoch 6 | time: 1622.82s | valid loss 0.9432 | valid ppl 2.5682 | learning rate 1.2500
|
366 |
+
| end of split 56 / 62 | epoch 6 | time: 1624.52s | valid loss 0.9431 | valid ppl 2.5680 | learning rate 1.2500
|
367 |
+
| end of split 57 / 62 | epoch 6 | time: 1626.41s | valid loss 0.9431 | valid ppl 2.5679 | learning rate 1.2500
|
368 |
+
| end of split 58 / 62 | epoch 6 | time: 1625.56s | valid loss 0.9434 | valid ppl 2.5686 | learning rate 1.2500
|
369 |
+
| end of split 59 / 62 | epoch 6 | time: 1627.15s | valid loss 0.9431 | valid ppl 2.5678 | learning rate 1.2500
|
370 |
+
| end of split 60 / 62 | epoch 6 | time: 1627.44s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
|
371 |
+
| end of split 61 / 62 | epoch 6 | time: 1627.57s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
|
372 |
+
| end of split 62 / 62 | epoch 6 | time: 1625.18s | valid loss 0.9430 | valid ppl 2.5677 | learning rate 1.2500
|
373 |
+
| end of split 1 / 62 | epoch 7 | time: 1620.40s | valid loss 0.9429 | valid ppl 2.5675 | learning rate 1.2500
|
374 |
+
| end of split 2 / 62 | epoch 7 | time: 1627.79s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
|
375 |
+
| end of split 3 / 62 | epoch 7 | time: 1627.64s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
|
376 |
+
| end of split 4 / 62 | epoch 7 | time: 1626.87s | valid loss 0.9430 | valid ppl 2.5676 | learning rate 1.2500
|
377 |
+
| end of split 5 / 62 | epoch 7 | time: 1628.51s | valid loss 0.9429 | valid ppl 2.5674 | learning rate 1.2500
|
378 |
+
| end of split 6 / 62 | epoch 7 | time: 1627.38s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
|
379 |
+
| end of split 7 / 62 | epoch 7 | time: 1624.51s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
|
380 |
+
| end of split 8 / 62 | epoch 7 | time: 1622.62s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
|
381 |
+
| end of split 9 / 62 | epoch 7 | time: 1624.24s | valid loss 0.9429 | valid ppl 2.5673 | learning rate 1.2500
|
382 |
+
| end of split 10 / 62 | epoch 7 | time: 1625.57s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
|
383 |
+
| end of split 11 / 62 | epoch 7 | time: 1625.67s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
|
384 |
+
| end of split 12 / 62 | epoch 7 | time: 1716.44s | valid loss 0.9428 | valid ppl 2.5670 | learning rate 1.2500
|
385 |
+
| end of split 13 / 62 | epoch 7 | time: 1794.58s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
386 |
+
| end of split 14 / 62 | epoch 7 | time: 1783.52s | valid loss 0.9428 | valid ppl 2.5672 | learning rate 1.2500
|
387 |
+
| end of split 15 / 62 | epoch 7 | time: 1769.46s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
388 |
+
| end of split 16 / 62 | epoch 7 | time: 1775.92s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
389 |
+
| end of split 17 / 62 | epoch 7 | time: 1777.89s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
390 |
+
| end of split 18 / 62 | epoch 7 | time: 1783.47s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
391 |
+
| end of split 19 / 62 | epoch 7 | time: 1779.88s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
392 |
+
| end of split 20 / 62 | epoch 7 | time: 1763.54s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
393 |
+
| end of split 21 / 62 | epoch 7 | time: 1772.71s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
394 |
+
| end of split 22 / 62 | epoch 7 | time: 1775.60s | valid loss 0.9427 | valid ppl 2.5669 | learning rate 1.2500
|
395 |
+
| end of split 23 / 62 | epoch 7 | time: 1782.51s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
|
396 |
+
| end of split 24 / 62 | epoch 7 | time: 1754.16s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
|
397 |
+
| end of split 25 / 62 | epoch 7 | time: 941.64s | valid loss 0.9427 | valid ppl 2.5668 | learning rate 1.2500
|
398 |
+
| end of split 26 / 62 | epoch 7 | time: 1763.95s | valid loss 0.9428 | valid ppl 2.5671 | learning rate 1.2500
|
399 |
+
| end of split 27 / 62 | epoch 7 | time: 1776.44s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
|
400 |
+
| end of split 28 / 62 | epoch 7 | time: 1768.74s | valid loss 0.9426 | valid ppl 2.5665 | learning rate 1.2500
|
401 |
+
| end of split 29 / 62 | epoch 7 | time: 1800.52s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
|
402 |
+
| end of split 30 / 62 | epoch 7 | time: 1815.90s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
|
403 |
+
| end of split 31 / 62 | epoch 7 | time: 1745.49s | valid loss 0.9426 | valid ppl 2.5666 | learning rate 1.2500
|
404 |
+
| end of split 32 / 62 | epoch 7 | time: 1613.56s | valid loss 0.9425 | valid ppl 2.5664 | learning rate 1.2500
|
405 |
+
| end of split 33 / 62 | epoch 7 | time: 1628.29s | valid loss 0.9425 | valid ppl 2.5665 | learning rate 1.2500
|
406 |
+
| end of split 34 / 62 | epoch 7 | time: 1624.90s | valid loss 0.9425 | valid ppl 2.5663 | learning rate 1.2500
|
407 |
+
| end of split 35 / 62 | epoch 7 | time: 1626.26s | valid loss 0.9425 | valid ppl 2.5664 | learning rate 1.2500
|
408 |
+
| end of split 36 / 62 | epoch 7 | time: 1603.86s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 1.2500
|
409 |
+
| end of split 37 / 62 | epoch 7 | time: 1605.85s | valid loss 0.9424 | valid ppl 2.5663 | learning rate 1.2500
|
410 |
+
| end of split 38 / 62 | epoch 7 | time: 1603.91s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
|
411 |
+
| end of split 39 / 62 | epoch 7 | time: 1605.22s | valid loss 0.9425 | valid ppl 2.5663 | learning rate 1.2500
|
412 |
+
| end of split 40 / 62 | epoch 7 | time: 1602.75s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
|
413 |
+
| end of split 41 / 62 | epoch 7 | time: 1604.28s | valid loss 0.9424 | valid ppl 2.5660 | learning rate 1.2500
|
414 |
+
| end of split 42 / 62 | epoch 7 | time: 1603.89s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
|
415 |
+
| end of split 43 / 62 | epoch 7 | time: 1603.60s | valid loss 0.9426 | valid ppl 2.5667 | learning rate 1.2500
|
416 |
+
| end of split 44 / 62 | epoch 7 | time: 1606.62s | valid loss 0.9424 | valid ppl 2.5662 | learning rate 1.2500
|
417 |
+
| end of split 45 / 62 | epoch 7 | time: 1604.77s | valid loss 0.9424 | valid ppl 2.5661 | learning rate 1.2500
|
418 |
+
| end of split 46 / 62 | epoch 7 | time: 1603.10s | valid loss 0.9422 | valid ppl 2.5656 | learning rate 0.3125
|
419 |
+
| end of split 47 / 62 | epoch 7 | time: 1601.62s | valid loss 0.9422 | valid ppl 2.5655 | learning rate 0.3125
|
420 |
+
| end of split 48 / 62 | epoch 7 | time: 1604.55s | valid loss 0.9421 | valid ppl 2.5655 | learning rate 0.3125
|
421 |
+
| end of split 49 / 62 | epoch 7 | time: 1604.48s | valid loss 0.9421 | valid ppl 2.5654 | learning rate 0.3125
|
422 |
+
| end of split 50 / 62 | epoch 7 | time: 1603.34s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
|
423 |
+
| end of split 51 / 62 | epoch 7 | time: 1600.92s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
|
424 |
+
| end of split 52 / 62 | epoch 7 | time: 1604.70s | valid loss 0.9421 | valid ppl 2.5653 | learning rate 0.3125
|
425 |
+
| end of split 53 / 62 | epoch 7 | time: 1603.28s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
|
426 |
+
| end of split 54 / 62 | epoch 7 | time: 1610.64s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
|
427 |
+
| end of split 55 / 62 | epoch 7 | time: 1605.28s | valid loss 0.9421 | valid ppl 2.5652 | learning rate 0.3125
|
428 |
+
| end of split 56 / 62 | epoch 7 | time: 1603.78s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
|
429 |
+
| end of split 57 / 62 | epoch 7 | time: 1603.91s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
|
430 |
+
| end of split 58 / 62 | epoch 7 | time: 1605.53s | valid loss 0.9420 | valid ppl 2.5652 | learning rate 0.3125
|
431 |
+
| end of split 59 / 62 | epoch 7 | time: 1656.75s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
|
432 |
+
| end of split 60 / 62 | epoch 7 | time: 1603.18s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
|
433 |
+
| end of split 61 / 62 | epoch 7 | time: 1601.58s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
|
434 |
+
| end of split 62 / 62 | epoch 7 | time: 1602.32s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
|
435 |
+
| end of split 1 / 62 | epoch 8 | time: 1599.87s | valid loss 0.9420 | valid ppl 2.5651 | learning rate 0.3125
|
436 |
+
| end of split 2 / 62 | epoch 8 | time: 1605.15s | valid loss 0.9420 | valid ppl 2.5650 | learning rate 0.3125
|
437 |
+
| end of split 3 / 62 | epoch 8 | time: 1604.62s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
|
438 |
+
| end of split 4 / 62 | epoch 8 | time: 1604.72s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
|
439 |
+
| end of split 5 / 62 | epoch 8 | time: 1637.47s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
|
440 |
+
| end of split 6 / 62 | epoch 8 | time: 875.65s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
|
441 |
+
| end of split 7 / 62 | epoch 8 | time: 1638.44s | valid loss 0.9419 | valid ppl 2.5649 | learning rate 0.0781
|
442 |
+
| end of split 8 / 62 | epoch 8 | time: 1612.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
443 |
+
| end of split 9 / 62 | epoch 8 | time: 1621.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
444 |
+
| end of split 10 / 62 | epoch 8 | time: 1640.27s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
445 |
+
| end of split 11 / 62 | epoch 8 | time: 1640.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
446 |
+
| end of split 12 / 62 | epoch 8 | time: 1611.55s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
447 |
+
| end of split 13 / 62 | epoch 8 | time: 1608.16s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
448 |
+
| end of split 14 / 62 | epoch 8 | time: 1663.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
449 |
+
| end of split 15 / 62 | epoch 8 | time: 1668.15s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
450 |
+
| end of split 16 / 62 | epoch 8 | time: 1652.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
451 |
+
| end of split 17 / 62 | epoch 8 | time: 1614.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0781
|
452 |
+
| end of split 18 / 62 | epoch 8 | time: 1617.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
453 |
+
| end of split 19 / 62 | epoch 8 | time: 1628.04s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
454 |
+
| end of split 20 / 62 | epoch 8 | time: 1624.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
455 |
+
| end of split 21 / 62 | epoch 8 | time: 1637.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
456 |
+
| end of split 22 / 62 | epoch 8 | time: 1634.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
457 |
+
| end of split 23 / 62 | epoch 8 | time: 1620.99s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
458 |
+
| end of split 24 / 62 | epoch 8 | time: 1616.31s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
459 |
+
| end of split 25 / 62 | epoch 8 | time: 1611.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
460 |
+
| end of split 26 / 62 | epoch 8 | time: 1605.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
461 |
+
| end of split 27 / 62 | epoch 8 | time: 1607.65s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
462 |
+
| end of split 28 / 62 | epoch 8 | time: 1608.79s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0195
|
463 |
+
| end of split 29 / 62 | epoch 8 | time: 1608.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
464 |
+
| end of split 30 / 62 | epoch 8 | time: 1612.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
465 |
+
| end of split 31 / 62 | epoch 8 | time: 1612.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
466 |
+
| end of split 32 / 62 | epoch 8 | time: 1605.76s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
467 |
+
| end of split 33 / 62 | epoch 8 | time: 1609.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
468 |
+
| end of split 34 / 62 | epoch 8 | time: 1611.85s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
|
469 |
+
| end of split 35 / 62 | epoch 8 | time: 1620.65s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
470 |
+
| end of split 36 / 62 | epoch 8 | time: 1619.16s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
471 |
+
| end of split 37 / 62 | epoch 8 | time: 1604.30s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
|
472 |
+
| end of split 38 / 62 | epoch 8 | time: 1605.41s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0049
|
473 |
+
| end of split 39 / 62 | epoch 8 | time: 1639.13s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0049
|
474 |
+
| end of split 40 / 62 | epoch 8 | time: 1614.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
475 |
+
| end of split 41 / 62 | epoch 8 | time: 1619.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
476 |
+
| end of split 42 / 62 | epoch 8 | time: 1655.86s | valid loss 0.9419 | valid ppl 2.5647 | learning rate 0.0012
|
477 |
+
| end of split 43 / 62 | epoch 8 | time: 1652.46s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
478 |
+
| end of split 44 / 62 | epoch 8 | time: 1622.20s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
479 |
+
| end of split 45 / 62 | epoch 8 | time: 1623.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
480 |
+
| end of split 46 / 62 | epoch 8 | time: 1621.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
481 |
+
| end of split 47 / 62 | epoch 8 | time: 1619.93s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
482 |
+
| end of split 48 / 62 | epoch 8 | time: 1626.00s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
483 |
+
| end of split 49 / 62 | epoch 8 | time: 1619.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
484 |
+
| end of split 50 / 62 | epoch 8 | time: 1619.02s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0012
|
485 |
+
| end of split 51 / 62 | epoch 8 | time: 1670.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
486 |
+
| end of split 52 / 62 | epoch 8 | time: 1671.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
487 |
+
| end of split 53 / 62 | epoch 8 | time: 1675.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
488 |
+
| end of split 54 / 62 | epoch 8 | time: 1674.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
489 |
+
| end of split 55 / 62 | epoch 8 | time: 1662.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
490 |
+
| end of split 56 / 62 | epoch 8 | time: 1656.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
491 |
+
| end of split 57 / 62 | epoch 8 | time: 1656.32s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
492 |
+
| end of split 58 / 62 | epoch 8 | time: 1656.95s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
493 |
+
| end of split 59 / 62 | epoch 8 | time: 1650.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
494 |
+
| end of split 60 / 62 | epoch 8 | time: 1621.00s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
495 |
+
| end of split 61 / 62 | epoch 8 | time: 1621.93s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0003
|
496 |
+
| end of split 62 / 62 | epoch 8 | time: 1619.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
497 |
+
| end of split 1 / 62 | epoch 9 | time: 1615.83s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
498 |
+
| end of split 2 / 62 | epoch 9 | time: 1665.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
499 |
+
| end of split 3 / 62 | epoch 9 | time: 1619.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
500 |
+
| end of split 4 / 62 | epoch 9 | time: 1618.02s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
501 |
+
| end of split 5 / 62 | epoch 9 | time: 1615.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
502 |
+
| end of split 6 / 62 | epoch 9 | time: 1617.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
503 |
+
| end of split 7 / 62 | epoch 9 | time: 1613.72s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
504 |
+
| end of split 8 / 62 | epoch 9 | time: 1617.41s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
505 |
+
| end of split 9 / 62 | epoch 9 | time: 1609.69s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
506 |
+
| end of split 10 / 62 | epoch 9 | time: 1608.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0001
|
507 |
+
| end of split 11 / 62 | epoch 9 | time: 1619.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
508 |
+
| end of split 12 / 62 | epoch 9 | time: 1616.51s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
509 |
+
| end of split 13 / 62 | epoch 9 | time: 1611.11s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
510 |
+
| end of split 14 / 62 | epoch 9 | time: 1609.59s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
511 |
+
| end of split 15 / 62 | epoch 9 | time: 1609.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
512 |
+
| end of split 16 / 62 | epoch 9 | time: 1609.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
513 |
+
| end of split 17 / 62 | epoch 9 | time: 1610.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
514 |
+
| end of split 18 / 62 | epoch 9 | time: 1605.49s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
515 |
+
| end of split 19 / 62 | epoch 9 | time: 1609.29s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
516 |
+
| end of split 20 / 62 | epoch 9 | time: 1610.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
517 |
+
| end of split 21 / 62 | epoch 9 | time: 1610.08s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
518 |
+
| end of split 22 / 62 | epoch 9 | time: 1609.18s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
519 |
+
| end of split 23 / 62 | epoch 9 | time: 1608.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
520 |
+
| end of split 24 / 62 | epoch 9 | time: 1609.79s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
521 |
+
| end of split 25 / 62 | epoch 9 | time: 1608.82s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
522 |
+
| end of split 26 / 62 | epoch 9 | time: 1609.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
523 |
+
| end of split 27 / 62 | epoch 9 | time: 1611.33s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
524 |
+
| end of split 28 / 62 | epoch 9 | time: 1612.14s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
525 |
+
| end of split 29 / 62 | epoch 9 | time: 1611.11s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
526 |
+
| end of split 30 / 62 | epoch 9 | time: 1612.06s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
527 |
+
| end of split 31 / 62 | epoch 9 | time: 1609.92s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
528 |
+
| end of split 32 / 62 | epoch 9 | time: 1606.74s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
529 |
+
| end of split 33 / 62 | epoch 9 | time: 1609.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
530 |
+
| end of split 34 / 62 | epoch 9 | time: 1610.44s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
531 |
+
| end of split 35 / 62 | epoch 9 | time: 1613.24s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
532 |
+
| end of split 36 / 62 | epoch 9 | time: 1614.50s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
533 |
+
| end of split 37 / 62 | epoch 9 | time: 1612.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
534 |
+
| end of split 38 / 62 | epoch 9 | time: 1614.71s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
535 |
+
| end of split 39 / 62 | epoch 9 | time: 1616.78s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
536 |
+
| end of split 40 / 62 | epoch 9 | time: 1618.87s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
537 |
+
| end of split 41 / 62 | epoch 9 | time: 1616.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
538 |
+
| end of split 42 / 62 | epoch 9 | time: 1590.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
539 |
+
| end of split 43 / 62 | epoch 9 | time: 1588.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
540 |
+
| end of split 44 / 62 | epoch 9 | time: 1587.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
541 |
+
| end of split 45 / 62 | epoch 9 | time: 1588.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
542 |
+
| end of split 46 / 62 | epoch 9 | time: 1599.34s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
543 |
+
| end of split 47 / 62 | epoch 9 | time: 1601.18s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
544 |
+
| end of split 48 / 62 | epoch 9 | time: 1601.25s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
545 |
+
| end of split 49 / 62 | epoch 9 | time: 1602.68s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
546 |
+
| end of split 50 / 62 | epoch 9 | time: 1601.60s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
547 |
+
| end of split 51 / 62 | epoch 9 | time: 855.74s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
548 |
+
| end of split 52 / 62 | epoch 9 | time: 1601.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
549 |
+
| end of split 53 / 62 | epoch 9 | time: 1600.52s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
550 |
+
| end of split 54 / 62 | epoch 9 | time: 1596.97s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
551 |
+
| end of split 55 / 62 | epoch 9 | time: 1594.84s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
552 |
+
| end of split 56 / 62 | epoch 9 | time: 1587.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
553 |
+
| end of split 57 / 62 | epoch 9 | time: 1603.26s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
554 |
+
| end of split 58 / 62 | epoch 9 | time: 1616.94s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
555 |
+
| end of split 59 / 62 | epoch 9 | time: 1616.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
556 |
+
| end of split 60 / 62 | epoch 9 | time: 1618.38s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
557 |
+
| end of split 61 / 62 | epoch 9 | time: 1617.22s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
558 |
+
| end of split 62 / 62 | epoch 9 | time: 1618.64s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
559 |
+
| end of split 1 / 62 | epoch 10 | time: 1611.07s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
560 |
+
| end of split 2 / 62 | epoch 10 | time: 1613.91s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
561 |
+
| end of split 3 / 62 | epoch 10 | time: 1612.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
562 |
+
| end of split 4 / 62 | epoch 10 | time: 1616.37s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
563 |
+
| end of split 5 / 62 | epoch 10 | time: 1614.61s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
564 |
+
| end of split 6 / 62 | epoch 10 | time: 1616.88s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
565 |
+
| end of split 7 / 62 | epoch 10 | time: 1614.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
566 |
+
| end of split 8 / 62 | epoch 10 | time: 1616.10s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
567 |
+
| end of split 9 / 62 | epoch 10 | time: 1617.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
568 |
+
| end of split 10 / 62 | epoch 10 | time: 1616.50s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
569 |
+
| end of split 11 / 62 | epoch 10 | time: 1614.86s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
570 |
+
| end of split 12 / 62 | epoch 10 | time: 1616.48s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
571 |
+
| end of split 13 / 62 | epoch 10 | time: 1614.77s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
572 |
+
| end of split 14 / 62 | epoch 10 | time: 1616.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
573 |
+
| end of split 15 / 62 | epoch 10 | time: 1617.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
574 |
+
| end of split 16 / 62 | epoch 10 | time: 1617.48s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
575 |
+
| end of split 17 / 62 | epoch 10 | time: 1617.70s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
576 |
+
| end of split 18 / 62 | epoch 10 | time: 1616.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
577 |
+
| end of split 19 / 62 | epoch 10 | time: 1615.61s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
578 |
+
| end of split 20 / 62 | epoch 10 | time: 1616.89s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
579 |
+
| end of split 21 / 62 | epoch 10 | time: 1617.98s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
580 |
+
| end of split 22 / 62 | epoch 10 | time: 1615.66s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
581 |
+
| end of split 23 / 62 | epoch 10 | time: 1617.35s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
582 |
+
| end of split 24 / 62 | epoch 10 | time: 1619.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
583 |
+
| end of split 25 / 62 | epoch 10 | time: 1621.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
584 |
+
| end of split 26 / 62 | epoch 10 | time: 1619.96s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
585 |
+
| end of split 27 / 62 | epoch 10 | time: 1620.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
586 |
+
| end of split 28 / 62 | epoch 10 | time: 1622.73s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
587 |
+
| end of split 29 / 62 | epoch 10 | time: 1624.40s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
588 |
+
| end of split 30 / 62 | epoch 10 | time: 1625.63s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
589 |
+
| end of split 31 / 62 | epoch 10 | time: 1621.94s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
590 |
+
| end of split 32 / 62 | epoch 10 | time: 1628.25s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
591 |
+
| end of split 33 / 62 | epoch 10 | time: 1629.39s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
592 |
+
| end of split 34 / 62 | epoch 10 | time: 1630.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
593 |
+
| end of split 35 / 62 | epoch 10 | time: 1631.43s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
594 |
+
| end of split 36 / 62 | epoch 10 | time: 1631.67s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
595 |
+
| end of split 37 / 62 | epoch 10 | time: 1634.08s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
596 |
+
| end of split 38 / 62 | epoch 10 | time: 1634.51s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
597 |
+
| end of split 39 / 62 | epoch 10 | time: 1634.03s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
598 |
+
| end of split 40 / 62 | epoch 10 | time: 1631.42s | valid loss 0.9419 | valid ppl 2.5648 | learning rate 0.0000
|
599 |
+
TEST: valid loss 0.9404 | valid ppl 2.5611
|
pipeline.py
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import List, Dict
|
2 |
+
from flair.models.language_model import LanguageModel
|
3 |
+
|
4 |
+
|
5 |
+
class PreTrainedPipeline:
|
6 |
+
def __init__(self, path=""):
|
7 |
+
from huggingface_hub import hf_hub_download
|
8 |
+
|
9 |
+
self.model = LanguageModel.load_language_model(
|
10 |
+
hf_hub_download(repo_id="dchaplinsky/flair-uk-forward-large", filename="best-lm.pt")
|
11 |
+
)
|
12 |
+
|
13 |
+
def __call__(self, inputs: str) -> List[Dict]:
|
14 |
+
"""
|
15 |
+
Args:
|
16 |
+
inputs (:obj:`str`):
|
17 |
+
a string containing some text
|
18 |
+
Return:
|
19 |
+
A :obj:`str`
|
20 |
+
"""
|
21 |
+
inputs = inputs.strip()
|
22 |
+
return [{"generated_text": self.model.generate_text(inputs, temperature=0.5)[0]}]
|
requirements.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
flair
|