---
library_name: transformers
tags:
- generated_from_trainer
datasets:
- kanishka/babylm2-rewritten-clean-spacy
metrics:
- accuracy
model-index:
- name: opt-babylm2-rewritten-clean-spacy-32k-earlystop-40epochs_seed-42_1e-3
  results:
  - task:
      name: Causal Language Modeling
      type: text-generation
    dataset:
      name: kanishka/babylm2-rewritten-clean-spacy
      type: kanishka/babylm2-rewritten-clean-spacy
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.42334742212654364
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# opt-babylm2-rewritten-clean-spacy-32k-earlystop-40epochs_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset.
It achieves the following results on the evaluation set:
- Loss: 2.9600
- Accuracy: 0.4233

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 40.0
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch   | Step  | Validation Loss | Accuracy |
|:-------------:|:-------:|:-----:|:---------------:|:--------:|
| 5.9216        | 0.9996  | 1931  | 4.0134          | 0.3253   |
| 3.7977        | 1.9997  | 3863  | 3.5448          | 0.3639   |
| 3.3887        | 2.9999  | 5795  | 3.3242          | 0.3841   |
| 3.1805        | 4.0     | 7727  | 3.2082          | 0.3949   |
| 3.0632        | 4.9996  | 9658  | 3.1432          | 0.4012   |
| 2.9865        | 5.9997  | 11590 | 3.1010          | 0.4056   |
| 2.9347        | 6.9999  | 13522 | 3.0715          | 0.4087   |
| 2.8953        | 8.0     | 15454 | 3.0539          | 0.4108   |
| 2.8689        | 8.9996  | 17385 | 3.0392          | 0.4122   |
| 2.8456        | 9.9997  | 19317 | 3.0310          | 0.4134   |
| 2.8298        | 10.9999 | 21249 | 3.0251          | 0.4144   |
| 2.817         | 12.0    | 23181 | 3.0175          | 0.4152   |
| 2.8069        | 12.9996 | 25112 | 3.0119          | 0.4158   |
| 2.7996        | 13.9997 | 27044 | 3.0060          | 0.4163   |
| 2.7615        | 14.9999 | 28976 | 3.0038          | 0.4171   |
| 2.7575        | 16.0    | 30908 | 3.0022          | 0.4169   |
| 2.7573        | 16.9996 | 32839 | 2.9962          | 0.4179   |
| 2.7451        | 17.9997 | 34771 | 2.9867          | 0.4189   |
| 2.7275        | 18.9999 | 36703 | 2.9804          | 0.4201   |
| 2.7099        | 20.0    | 38635 | 2.9760          | 0.4208   |
| 2.693         | 20.9996 | 40566 | 2.9683          | 0.4216   |
| 2.6785        | 21.9997 | 42498 | 2.9666          | 0.4221   |
| 2.6628        | 22.9999 | 44430 | 2.9646          | 0.4227   |
| 2.6501        | 24.0    | 46362 | 2.9626          | 0.4228   |
| 2.6343        | 24.9996 | 48293 | 2.9600          | 0.4233   |
| 2.6198        | 25.9997 | 50225 | 2.9638          | 0.4236   |
| 2.604         | 26.9999 | 52157 | 2.9604          | 0.4240   |
| 2.5876        | 28.0    | 54089 | 2.9601          | 0.4245   |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.4.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.0