gpt_train_6_256

This model is a fine-tuned version of openai-community/gpt2 on the gokuls/wiki_book_corpus_raw_dataset_tiny dataset. It achieves the following results on the evaluation set:

  • Loss: 9.4766
  • Accuracy: 0.0851

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 36
  • eval_batch_size: 36
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
10.8672 0.0001 1 10.8672 0.0045
10.8672 0.0001 2 10.8672 0.0045
10.8672 0.0002 3 10.8672 0.0045
10.8672 0.0002 4 10.8672 0.0045
10.8594 0.0003 5 10.8672 0.0045
10.8672 0.0003 6 10.8672 0.0045
10.8672 0.0004 7 10.8672 0.0045
10.8672 0.0004 8 10.8672 0.0045
10.8672 0.0005 9 10.8672 0.0045
10.8594 0.0005 10 10.8672 0.0045
10.8672 0.0006 11 10.8672 0.0045
10.8672 0.0007 12 10.8672 0.0045
10.8594 0.0007 13 10.8672 0.0045
10.8672 0.0008 14 10.8672 0.0045
10.8594 0.0008 15 10.8672 0.0045
10.8594 0.0009 16 10.8672 0.0045
10.8672 0.0009 17 10.8672 0.0045
10.8672 0.0010 18 10.8359 0.0086
10.8359 0.0010 19 10.8047 0.0108
10.8047 0.0011 20 10.7734 0.0113
10.7891 0.0011 21 10.75 0.0115
10.7578 0.0012 22 10.7266 0.0119
10.7188 0.0013 23 10.7031 0.0129
10.7188 0.0013 24 10.6797 0.0147
10.6953 0.0014 25 10.6641 0.0179
10.6719 0.0014 26 10.6406 0.0231
10.6562 0.0015 27 10.625 0.0286
10.6641 0.0015 28 10.6094 0.0347
10.6328 0.0016 29 10.5938 0.0399
10.6016 0.0016 30 10.5781 0.0436
10.6016 0.0017 31 10.5703 0.0463
10.5938 0.0017 32 10.5547 0.0479
10.5781 0.0018 33 10.5469 0.0484
10.5547 0.0019 34 10.5312 0.0484
10.5469 0.0019 35 10.5234 0.0484
10.5391 0.0020 36 10.5156 0.0482
10.5312 0.0020 37 10.5078 0.0475
10.5312 0.0021 38 10.4922 0.0475
10.4922 0.0021 39 10.4844 0.0476
10.5078 0.0022 40 10.4844 0.0477
10.4922 0.0022 41 10.4766 0.0481
10.4844 0.0023 42 10.4688 0.0486
10.4766 0.0023 43 10.4609 0.0493
10.4844 0.0024 44 10.4531 0.0495
10.4688 0.0025 45 10.4453 0.0503
10.4844 0.0025 46 10.4453 0.0513
10.4609 0.0026 47 10.4375 0.0522
10.4453 0.0026 48 10.4297 0.0526
10.4453 0.0027 49 10.4297 0.0532
10.4297 0.0027 50 10.4219 0.0537
10.4219 0.0028 51 10.4141 0.0544
10.4297 0.0028 52 10.4141 0.0548
10.4375 0.0029 53 10.4062 0.0554
10.4219 0.0029 54 10.4062 0.0558
10.4141 0.0030 55 10.3984 0.0565
10.4141 0.0031 56 10.3906 0.0574
10.4219 0.0031 57 10.3906 0.0583
10.4219 0.0032 58 10.3828 0.0591
10.3984 0.0032 59 10.3828 0.0598
10.3984 0.0033 60 10.375 0.0603
10.3984 0.0033 61 10.375 0.0607
10.3906 0.0034 62 10.3672 0.0611
10.3672 0.0034 63 10.3672 0.0615
10.3906 0.0035 64 10.3594 0.0616
10.3828 0.0035 65 10.3594 0.0615
10.3594 0.0036 66 10.3516 0.0614
10.3516 0.0037 67 10.3438 0.0610
10.3516 0.0037 68 10.3438 0.0609
10.3438 0.0038 69 10.3359 0.0611
10.3594 0.0038 70 10.3359 0.0610
10.3594 0.0039 71 10.3281 0.0610
10.3203 0.0039 72 10.3281 0.0610
10.3516 0.0040 73 10.3203 0.0610
10.3203 0.0040 74 10.3125 0.0611
10.3281 0.0041 75 10.3125 0.0612
10.3438 0.0041 76 10.3047 0.0614
10.2969 0.0042 77 10.3047 0.0618
10.3281 0.0043 78 10.2969 0.0622
10.2891 0.0043 79 10.2969 0.0628
10.3047 0.0044 80 10.2891 0.0632
10.2969 0.0044 81 10.2812 0.0637
10.2891 0.0045 82 10.2812 0.0643
10.3125 0.0045 83 10.2734 0.0649
10.2891 0.0046 84 10.2734 0.0654
10.2812 0.0046 85 10.2656 0.0657
10.3047 0.0047 86 10.2656 0.0659
10.2969 0.0047 87 10.2578 0.0660
10.2578 0.0048 88 10.25 0.0661
10.2812 0.0048 89 10.25 0.0662
10.2734 0.0049 90 10.2422 0.0663
10.2891 0.0050 91 10.2422 0.0664
10.2578 0.0050 92 10.2344 0.0666
10.2734 0.0051 93 10.2344 0.0668
10.2266 0.0051 94 10.2266 0.0671
10.2578 0.0052 95 10.2266 0.0674
10.25 0.0052 96 10.2188 0.0676
10.2266 0.0053 97 10.2188 0.0678
10.2266 0.0053 98 10.2109 0.0679
10.2344 0.0054 99 10.2109 0.0681
10.2422 0.0054 100 10.2031 0.0682
10.2422 0.0055 101 10.2031 0.0683
10.2266 0.0056 102 10.1953 0.0685
10.2188 0.0056 103 10.1953 0.0686
10.2109 0.0057 104 10.1875 0.0687
10.1797 0.0057 105 10.1875 0.0689
10.1797 0.0058 106 10.1797 0.0691
10.1719 0.0058 107 10.1797 0.0693
10.1875 0.0059 108 10.1719 0.0696
10.1797 0.0059 109 10.1719 0.0698
10.1797 0.0060 110 10.1641 0.0700
10.1406 0.0060 111 10.1641 0.0702
10.1719 0.0061 112 10.1641 0.0704
10.1953 0.0062 113 10.1562 0.0706
10.1719 0.0062 114 10.1562 0.0708
10.1641 0.0063 115 10.1484 0.0710
10.1719 0.0063 116 10.1484 0.0712
10.1484 0.0064 117 10.1406 0.0713
10.1562 0.0064 118 10.1406 0.0715
10.1562 0.0065 119 10.1328 0.0716
10.1484 0.0065 120 10.1328 0.0718
10.1406 0.0066 121 10.125 0.0719
10.1328 0.0066 122 10.125 0.0721
10.1641 0.0067 123 10.1172 0.0722
10.1328 0.0068 124 10.1172 0.0723
10.1484 0.0068 125 10.1094 0.0725
10.1406 0.0069 126 10.1094 0.0726
10.1406 0.0069 127 10.1016 0.0728
10.125 0.0070 128 10.1016 0.0729
10.1172 0.0070 129 10.0938 0.0731
10.1016 0.0071 130 10.0938 0.0732
10.1172 0.0071 131 10.0859 0.0733
10.1172 0.0072 132 10.0859 0.0734
10.1172 0.0072 133 10.0859 0.0736
10.0938 0.0073 134 10.0781 0.0737
10.1094 0.0074 135 10.0781 0.0738
10.1094 0.0074 136 10.0703 0.0740
10.0703 0.0075 137 10.0703 0.0742
10.0781 0.0075 138 10.0625 0.0743
10.0781 0.0076 139 10.0625 0.0745
10.0781 0.0076 140 10.0547 0.0746
10.0625 0.0077 141 10.0547 0.0747
10.0781 0.0077 142 10.0469 0.0749
10.0391 0.0078 143 10.0469 0.0750
10.0703 0.0078 144 10.0469 0.0751
10.0391 0.0079 145 10.0391 0.0753
10.0469 0.0080 146 10.0391 0.0754
10.0547 0.0080 147 10.0312 0.0755
10.0703 0.0081 148 10.0312 0.0756
10.0469 0.0081 149 10.0234 0.0757
10.0391 0.0082 150 10.0234 0.0759
10.0391 0.0082 151 10.0156 0.0760
10.0391 0.0083 152 10.0156 0.0761
10.0391 0.0083 153 10.0156 0.0762
10.0469 0.0084 154 10.0078 0.0763
10.0312 0.0084 155 10.0078 0.0765
9.9844 0.0085 156 10.0 0.0766
10.0 0.0086 157 10.0 0.0767
10.0078 0.0086 158 9.9922 0.0768
10.0078 0.0087 159 9.9922 0.0769
10.0234 0.0087 160 9.9922 0.0770
9.9922 0.0088 161 9.9844 0.0771
9.9922 0.0088 162 9.9844 0.0772
9.9766 0.0089 163 9.9766 0.0773
9.9922 0.0089 164 9.9766 0.0773
9.9766 0.0090 165 9.9688 0.0774
9.9844 0.0090 166 9.9688 0.0775
9.9766 0.0091 167 9.9688 0.0776
9.9844 0.0092 168 9.9609 0.0777
9.9609 0.0092 169 9.9609 0.0778
9.9766 0.0093 170 9.9531 0.0778
9.9531 0.0093 171 9.9531 0.0779
9.9922 0.0094 172 9.9531 0.0780
9.9531 0.0094 173 9.9453 0.0781
9.9375 0.0095 174 9.9453 0.0781
9.9688 0.0095 175 9.9375 0.0782
9.9453 0.0096 176 9.9375 0.0783
9.9375 0.0096 177 9.9375 0.0783
9.9375 0.0097 178 9.9297 0.0784
9.9453 0.0098 179 9.9297 0.0785
9.9453 0.0098 180 9.9219 0.0786
9.9297 0.0099 181 9.9219 0.0787
9.9375 0.0099 182 9.9141 0.0787
9.9375 0.0100 183 9.9141 0.0788
9.8984 0.0100 184 9.9141 0.0789
9.9375 0.0101 185 9.9062 0.0790
9.9297 0.0101 186 9.9062 0.0791
9.9297 0.0102 187 9.8984 0.0791
9.9141 0.0102 188 9.8984 0.0792
9.9219 0.0103 189 9.8984 0.0793
9.8984 0.0104 190 9.8906 0.0793
9.8828 0.0104 191 9.8906 0.0794
9.8984 0.0105 192 9.8828 0.0795
9.8906 0.0105 193 9.8828 0.0796
9.9062 0.0106 194 9.8828 0.0797
9.875 0.0106 195 9.875 0.0798
9.8594 0.0107 196 9.875 0.0798
9.8828 0.0107 197 9.875 0.0799
9.8984 0.0108 198 9.8672 0.0800
9.8906 0.0108 199 9.8672 0.0801
9.9062 0.0109 200 9.8594 0.0801
9.8672 0.0110 201 9.8594 0.0802
9.8672 0.0110 202 9.8594 0.0803
9.8906 0.0111 203 9.8516 0.0804
9.8828 0.0111 204 9.8516 0.0804
9.8906 0.0112 205 9.8438 0.0805
9.8828 0.0112 206 9.8438 0.0805
9.8594 0.0113 207 9.8438 0.0806
9.875 0.0113 208 9.8359 0.0806
9.8594 0.0114 209 9.8359 0.0807
9.8516 0.0114 210 9.8281 0.0808
9.8359 0.0115 211 9.8281 0.0809
9.8281 0.0116 212 9.8281 0.0810
9.8516 0.0116 213 9.8203 0.0810
9.8516 0.0117 214 9.8203 0.0811
9.8281 0.0117 215 9.8203 0.0811
9.8438 0.0118 216 9.8125 0.0812
9.8359 0.0118 217 9.8125 0.0813
9.8281 0.0119 218 9.8047 0.0814
9.8281 0.0119 219 9.8047 0.0815
9.8281 0.0120 220 9.8047 0.0815
9.7969 0.0120 221 9.7969 0.0816
9.8281 0.0121 222 9.7969 0.0816
9.8047 0.0122 223 9.7891 0.0817
9.8047 0.0122 224 9.7891 0.0818
9.8047 0.0123 225 9.7891 0.0818
9.8047 0.0123 226 9.7812 0.0819
9.8281 0.0124 227 9.7812 0.0819
9.7812 0.0124 228 9.7812 0.0819
9.7891 0.0125 229 9.7734 0.0820
9.7969 0.0125 230 9.7734 0.0821
9.7578 0.0126 231 9.7656 0.0821
9.8125 0.0126 232 9.7656 0.0822
9.7734 0.0127 233 9.7656 0.0823
9.7656 0.0128 234 9.7578 0.0823
9.7578 0.0128 235 9.7578 0.0824
9.7891 0.0129 236 9.7578 0.0824
9.7812 0.0129 237 9.75 0.0824
9.7656 0.0130 238 9.75 0.0825
9.7969 0.0130 239 9.75 0.0825
9.75 0.0131 240 9.7422 0.0825
9.7734 0.0131 241 9.7422 0.0825
9.7578 0.0132 242 9.7344 0.0825
9.7656 0.0132 243 9.7344 0.0825
9.7266 0.0133 244 9.7344 0.0826
9.75 0.0134 245 9.7266 0.0826
9.7422 0.0134 246 9.7266 0.0827
9.75 0.0135 247 9.7266 0.0827
9.7656 0.0135 248 9.7188 0.0828
9.7266 0.0136 249 9.7188 0.0828
9.75 0.0136 250 9.7109 0.0828
9.7266 0.0137 251 9.7109 0.0829
9.7266 0.0137 252 9.7109 0.0829
9.7266 0.0138 253 9.7031 0.0829
9.7266 0.0138 254 9.7031 0.0829
9.7344 0.0139 255 9.7031 0.0829
9.7109 0.0139 256 9.6953 0.0829
9.7109 0.0140 257 9.6953 0.0829
9.7109 0.0141 258 9.6953 0.0830
9.7031 0.0141 259 9.6875 0.0830
9.7109 0.0142 260 9.6875 0.0831
9.6953 0.0142 261 9.6797 0.0832
9.7031 0.0143 262 9.6797 0.0832
9.6953 0.0143 263 9.6797 0.0832
9.6875 0.0144 264 9.6719 0.0833
9.6719 0.0144 265 9.6719 0.0833
9.6797 0.0145 266 9.6719 0.0832
9.7188 0.0145 267 9.6641 0.0833
9.6953 0.0146 268 9.6641 0.0833
9.6797 0.0147 269 9.6641 0.0833
9.6719 0.0147 270 9.6562 0.0834
9.6875 0.0148 271 9.6562 0.0834
9.6641 0.0148 272 9.6484 0.0835
9.6719 0.0149 273 9.6484 0.0836
9.6719 0.0149 274 9.6484 0.0836
9.6406 0.0150 275 9.6406 0.0837
9.6641 0.0150 276 9.6406 0.0837
9.6328 0.0151 277 9.6406 0.0838
9.6328 0.0151 278 9.6328 0.0838
9.6484 0.0152 279 9.6328 0.0838
9.6484 0.0153 280 9.6328 0.0838
9.6875 0.0153 281 9.625 0.0838
9.6328 0.0154 282 9.625 0.0838
9.6562 0.0154 283 9.6172 0.0838
9.6719 0.0155 284 9.6172 0.0838
9.6641 0.0155 285 9.6172 0.0838
9.6328 0.0156 286 9.6094 0.0838
9.6328 0.0156 287 9.6094 0.0839
9.625 0.0157 288 9.6094 0.0839
9.6328 0.0157 289 9.6016 0.0840
9.6172 0.0158 290 9.6016 0.0840
9.6172 0.0159 291 9.6016 0.0841
9.6094 0.0159 292 9.5938 0.0841
9.6172 0.0160 293 9.5938 0.0842
9.6094 0.0160 294 9.5938 0.0842
9.6328 0.0161 295 9.5859 0.0842
9.5938 0.0161 296 9.5859 0.0842
9.5938 0.0162 297 9.5781 0.0842
9.6016 0.0162 298 9.5781 0.0842
9.5781 0.0163 299 9.5781 0.0842
9.5938 0.0163 300 9.5703 0.0843
9.5938 0.0164 301 9.5703 0.0843
9.6016 0.0165 302 9.5703 0.0844
9.5781 0.0165 303 9.5625 0.0845
9.6016 0.0166 304 9.5625 0.0845
9.5703 0.0166 305 9.5625 0.0845
9.5781 0.0167 306 9.5547 0.0845
9.5938 0.0167 307 9.5547 0.0846
9.5391 0.0168 308 9.5547 0.0846
9.5625 0.0168 309 9.5469 0.0846
9.5547 0.0169 310 9.5469 0.0846
9.5703 0.0169 311 9.5469 0.0846
9.5625 0.0170 312 9.5391 0.0846
9.5469 0.0171 313 9.5391 0.0846
9.5469 0.0171 314 9.5391 0.0846
9.5391 0.0172 315 9.5312 0.0847
9.5781 0.0172 316 9.5312 0.0847
9.5469 0.0173 317 9.5312 0.0847
9.5312 0.0173 318 9.5234 0.0848
9.5703 0.0174 319 9.5234 0.0848
9.5312 0.0174 320 9.5234 0.0848
9.5703 0.0175 321 9.5156 0.0848
9.5312 0.0175 322 9.5156 0.0849
9.5391 0.0176 323 9.5078 0.0849
9.5156 0.0177 324 9.5078 0.0849
9.5234 0.0177 325 9.5078 0.0849
9.5391 0.0178 326 9.5 0.0849
9.5078 0.0178 327 9.5 0.0849
9.5312 0.0179 328 9.5 0.0848
9.5078 0.0179 329 9.4922 0.0848
9.5234 0.0180 330 9.4922 0.0847
9.5078 0.0180 331 9.4922 0.0848
9.4922 0.0181 332 9.4844 0.0848
9.5 0.0181 333 9.4844 0.0849
9.5078 0.0182 334 9.4844 0.0850
9.4766 0.0183 335 9.4766 0.0851
9.5 0.0183 336 9.4766 0.0851

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0a0+32f93b1
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
29
Safetensors
Model size
30.7M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for gokulsrinivasagan/gpt_train_6_256

Finetuned
(1303)
this model

Dataset used to train gokulsrinivasagan/gpt_train_6_256

Evaluation results

  • Accuracy on gokuls/wiki_book_corpus_raw_dataset_tiny
    self-reported
    0.085