File size: 4,286 Bytes
e2270ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---

language: pl
tags:
- generated_from_trainer
- text-generation
widget:
- text: "Bolesław Leśmian - polski poeta"
datasets:
- wikipedia
metrics:
- accuracy
model-index:
- name: gpt_neo_pl_125M
  results:
  - task:
      name: Causal Language Modeling
      type: text-generation
    dataset:
      name: wikipedia 20220720.pl
      type: wikipedia
      args: 20220720.pl
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.4312838299951148
---


<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt_neo_pl_125M_v2

This model was trained from scratch on the wikipedia 20220720.pl dataset.
It achieves the following results on the evaluation set:
- Loss: 3.3862
- Accuracy: 0.4313

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002

- train_batch_size: 1

- eval_batch_size: 2

- seed: 42

- gradient_accumulation_steps: 8

- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000

- num_epochs: 1.0

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Accuracy |
|:-------------:|:-----:|:-----:|:---------------:|:--------:|
| 5.9469        | 0.02  | 1000  | 6.5843          | 0.1435   |
| 4.9953        | 0.05  | 2000  | 5.7709          | 0.1911   |
| 4.3754        | 0.07  | 3000  | 5.2624          | 0.2331   |
| 3.9795        | 0.1   | 4000  | 4.8752          | 0.2731   |
| 3.7099        | 0.12  | 5000  | 4.5927          | 0.3039   |
| 3.4747        | 0.15  | 6000  | 4.3942          | 0.3230   |
| 3.343         | 0.17  | 7000  | 4.2879          | 0.3349   |
| 3.2767        | 0.2   | 8000  | 4.1698          | 0.3459   |
| 3.1852        | 0.22  | 9000  | 4.0925          | 0.3534   |
| 3.0871        | 0.25  | 10000 | 4.0239          | 0.3608   |
| 3.0746        | 0.27  | 11000 | 3.9646          | 0.3664   |
| 2.9473        | 0.3   | 12000 | 3.9245          | 0.3706   |
| 2.9737        | 0.32  | 13000 | 3.8742          | 0.3754   |
| 2.9193        | 0.35  | 14000 | 3.8285          | 0.3796   |
| 2.8833        | 0.37  | 15000 | 3.7952          | 0.3837   |
| 2.8533        | 0.4   | 16000 | 3.7616          | 0.3873   |
| 2.8654        | 0.42  | 17000 | 3.7296          | 0.3907   |
| 2.8196        | 0.44  | 18000 | 3.7049          | 0.3936   |
| 2.7883        | 0.47  | 19000 | 3.6786          | 0.3966   |
| 2.747         | 0.49  | 20000 | 3.6488          | 0.3990   |
| 2.7355        | 0.52  | 21000 | 3.6243          | 0.4021   |
| 2.7355        | 0.54  | 22000 | 3.5982          | 0.4053   |
| 2.6999        | 0.57  | 23000 | 3.5765          | 0.4075   |
| 2.7243        | 0.59  | 24000 | 3.5558          | 0.4101   |
| 2.6526        | 0.62  | 25000 | 3.5371          | 0.4125   |
| 2.641         | 0.64  | 26000 | 3.5150          | 0.4146   |
| 2.6602        | 0.67  | 27000 | 3.4971          | 0.4168   |
| 2.644         | 0.69  | 28000 | 3.4812          | 0.4192   |
| 2.6558        | 0.72  | 29000 | 3.4622          | 0.4215   |
| 2.5664        | 0.74  | 30000 | 3.4504          | 0.4229   |
| 2.5669        | 0.77  | 31000 | 3.4376          | 0.4245   |
| 2.5498        | 0.79  | 32000 | 3.4263          | 0.4263   |
| 2.5874        | 0.82  | 33000 | 3.4169          | 0.4274   |
| 2.5555        | 0.84  | 34000 | 3.4067          | 0.4286   |
| 2.5502        | 0.86  | 35000 | 3.3997          | 0.4298   |
| 2.5232        | 0.89  | 36000 | 3.3946          | 0.4302   |
| 2.5369        | 0.91  | 37000 | 3.3898          | 0.4309   |
| 2.5335        | 0.94  | 38000 | 3.3869          | 0.4313   |
| 2.6032        | 0.96  | 39000 | 3.3853          | 0.4315   |
| 2.5244        | 0.99  | 40000 | 3.3850          | 0.4314   |


### Framework versions

- Transformers 4.22.0.dev0
- Pytorch 1.12.0
- Datasets 2.4.0
- Tokenizers 0.12.1