kanishka's picture
End of training
bbe1242 verified
metadata
library_name: transformers
tags:
  - generated_from_trainer
datasets:
  - kanishka/babylm2-rewritten-clean-spacy
metrics:
  - accuracy
model-index:
  - name: opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: kanishka/babylm2-rewritten-clean-spacy
          type: kanishka/babylm2-rewritten-clean-spacy
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.47865689612852264

opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6880
  • Accuracy: 0.4787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.0959 0.9999 2257 3.8126 0.3613
3.4463 1.9999 4514 3.2972 0.4099
3.1228 2.9998 6771 3.0851 0.4315
2.9166 3.9998 9028 2.9807 0.4418
2.8402 4.9997 11285 2.9249 0.4476
2.7832 5.9997 13542 2.8851 0.4521
2.7377 6.9996 15799 2.8602 0.4546
2.7101 8.0 18057 2.8389 0.4572
2.684 8.9999 20314 2.8260 0.4586
2.6654 9.9999 22571 2.8155 0.4596
2.6466 10.9998 24828 2.8077 0.4604
2.6474 11.9998 27085 2.8025 0.4615
2.6366 12.9997 29342 2.7983 0.4619
2.625 13.9997 31599 2.7928 0.4626
2.6109 14.9996 33856 2.7690 0.4654
2.5658 16.0 36114 2.7445 0.4686
2.5185 16.9999 38371 2.7228 0.4717
2.4637 17.9999 40628 2.7043 0.4747
2.3969 18.9998 42885 2.6895 0.4774
2.3245 19.9989 45140 2.6880 0.4787

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0