---
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: llama-gsm-real-and-synthetic-sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama-gsm-real-and-synthetic-sftsd2

This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9739
- Num Input Tokens Seen: 3582416

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.5881          | 0                 |
| 1.2436        | 0.0429 | 5    | 1.4398          | 152424            |
| 1.0386        | 0.0857 | 10   | 1.2453          | 312080            |
| 0.9477        | 0.1286 | 15   | 1.1613          | 467744            |
| 0.9171        | 0.1715 | 20   | 1.1153          | 624840            |
| 0.9103        | 0.2144 | 25   | 1.0952          | 780224            |
| 0.8238        | 0.2572 | 30   | 1.0748          | 936608            |
| 0.8472        | 0.3001 | 35   | 1.0563          | 1092128           |
| 0.8196        | 0.3430 | 40   | 1.0417          | 1250736           |
| 0.7769        | 0.3859 | 45   | 1.0217          | 1400344           |
| 0.7825        | 0.4287 | 50   | 1.0084          | 1552728           |
| 0.768         | 0.4716 | 55   | 1.0008          | 1700648           |
| 0.7492        | 0.5145 | 60   | 0.9968          | 1850360           |
| 0.8147        | 0.5573 | 65   | 0.9917          | 2002688           |
| 0.766         | 0.6002 | 70   | 0.9894          | 2161608           |
| 0.7926        | 0.6431 | 75   | 0.9865          | 2318744           |
| 0.7766        | 0.6860 | 80   | 0.9862          | 2477088           |
| 0.7827        | 0.7288 | 85   | 0.9799          | 2632344           |
| 0.7605        | 0.7717 | 90   | 0.9819          | 2784768           |
| 0.7443        | 0.8146 | 95   | 0.9775          | 2938072           |
| 0.7146        | 0.8574 | 100  | 0.9778          | 3095408           |
| 0.7503        | 0.9003 | 105  | 0.9770          | 3250064           |
| 0.7265        | 0.9432 | 110  | 0.9759          | 3400968           |
| 0.8001        | 0.9861 | 115  | 0.9747          | 3553016           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1