--- library_name: transformers base_model: kajuma/falcon3_1b_patch tags: - generated_from_trainer datasets: - kajuma/training_12-23_token model-index: - name: results results: [] --- # results This model is a fine-tuned version of [kajuma/falcon3_1b_patch](https://huggingface.co/kajuma/falcon3_1b_patch) on the kajuma/training_12-23_token dataset. It achieves the following results on the evaluation set: - Loss: 3.2373 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 28 - eval_batch_size: 16 - seed: 42 - gradient_accumulation_steps: 9 - total_train_batch_size: 252 - optimizer: Use schedule_free_radam with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: constant - num_epochs: 1.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 2.014 | 0.0515 | 500 | 2.1112 | | 1.8567 | 0.1029 | 1000 | 2.2158 | | 1.8408 | 0.1544 | 1500 | 2.2958 | | 1.8394 | 0.2059 | 2000 | 2.5333 | | 1.7843 | 0.2573 | 2500 | 2.5528 | | 1.7623 | 0.3088 | 3000 | 2.6474 | | 1.7908 | 0.3603 | 3500 | 2.7576 | | 1.7743 | 0.4117 | 4000 | 2.8830 | | 1.7513 | 0.4632 | 4500 | 2.9958 | | 1.7205 | 0.5147 | 5000 | 3.1096 | | 1.7321 | 0.5661 | 5500 | 3.2373 | ### Framework versions - Transformers 4.48.0.dev0 - Pytorch 2.5.1 - Datasets 3.2.0 - Tokenizers 0.21.0