metadata

library_name: transformers
tags:
  - generated_from_trainer
datasets:
  - kanishka/babylm2-rewritten-clean-spacy
metrics:
  - accuracy
model-index:
  - name: opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: kanishka/babylm2-rewritten-clean-spacy
          type: kanishka/babylm2-rewritten-clean-spacy
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.47865689612852264

opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset. It achieves the following results on the evaluation set:

Loss: 2.6880
Accuracy: 0.4787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.0959	0.9999	2257	3.8126	0.3613
3.4463	1.9999	4514	3.2972	0.4099
3.1228	2.9998	6771	3.0851	0.4315
2.9166	3.9998	9028	2.9807	0.4418
2.8402	4.9997	11285	2.9249	0.4476
2.7832	5.9997	13542	2.8851	0.4521
2.7377	6.9996	15799	2.8602	0.4546
2.7101	8.0	18057	2.8389	0.4572
2.684	8.9999	20314	2.8260	0.4586
2.6654	9.9999	22571	2.8155	0.4596
2.6466	10.9998	24828	2.8077	0.4604
2.6474	11.9998	27085	2.8025	0.4615
2.6366	12.9997	29342	2.7983	0.4619
2.625	13.9997	31599	2.7928	0.4626
2.6109	14.9996	33856	2.7690	0.4654
2.5658	16.0	36114	2.7445	0.4686
2.5185	16.9999	38371	2.7228	0.4717
2.4637	17.9999	40628	2.7043	0.4747
2.3969	18.9998	42885	2.6895	0.4774
2.3245	19.9989	45140	2.6880	0.4787

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0