fblgit
/

pancho-v1-qw25-3B-UNAMGS

Generated from Trainer

Model card Files Files and versions Community

fblgit commited on Nov 4, 2024

Commit

6390403

·

verified ·

1 Parent(s): e68d693

Create README.md

Files changed (1) hide show

README.md +81 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+library_name: peft
+license: other
+base_model: Qwen/Qwen2.5-3B-Instruct
+tags:
+- generated_from_trainer
+model-index:
+- name: pancho-v1-qw25-3B-UNAMGS
+  results: []
+datasets:
+- Magpie-Align/Magpie-Pro-MT-300K-v0.1
+- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
+language:
+- en
+---
+# pancho-v1-qw25-3B-UNAMGS
+This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct):
+It achieves the following results on the evaluation set:
+- Loss: 0.6555
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+## Model description
+Trained with MagPie:
+- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
+- Magpie-Align/Magpie-Pro-MT-300K-v0.1
+UNA on MLPs `4, 10, 16, 22, 28`
+MGS on 3 Scales.
+Following https://arxiv.org/abs//2410.21228 facts.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 256
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.2127        | 0.0015 | 1    | 0.8711          |
+| 0.9905        | 0.0509 | 35   | 0.7338          |
+| 0.9685        | 0.1019 | 70   | 0.7114          |
+| 0.9554        | 0.1528 | 105  | 0.6994          |
+| 0.9077        | 0.2037 | 140  | 0.6915          |
+| 0.9149        | 0.2547 | 175  | 0.6859          |
+| 0.9363        | 0.3056 | 210  | 0.6795          |
+| 0.8975        | 0.3566 | 245  | 0.6745          |
+| 0.9095        | 0.4075 | 280  | 0.6709          |
+| 0.9216        | 0.4584 | 315  | 0.6681          |
+| 0.9143        | 0.5094 | 350  | 0.6666          |
+| 0.8879        | 0.5603 | 385  | 0.6645          |
+| 0.9194        | 0.6112 | 420  | 0.6625          |
+| 0.9123        | 0.6622 | 455  | 0.6615          |
+| 0.9056        | 0.7131 | 490  | 0.6591          |
+| 0.9172        | 0.7641 | 525  | 0.6578          |
+| 0.886         | 0.8150 | 560  | 0.6566          |
+| 0.9155        | 0.8659 | 595  | 0.6568          |
+| 0.9029        | 0.9169 | 630  | 0.6560          |
+| 0.8942        | 0.9678 | 665  | 0.6555          |
+### Framework versions
+- PEFT 0.13.2
+- Transformers 4.45.2
+- Pytorch 2.3.0+cu121
+- Datasets 3.0.1
+- Tokenizers 0.20.1#