File size: 2,337 Bytes
53921ba 278bab5 53921ba 41bf990 53921ba 41bf990 278bab5 41bf990 278bab5 53921ba 278bab5 53921ba a8d656c 53921ba 278bab5 53921ba 278bab5 53921ba 278bab5 53921ba 278bab5 53921ba 278bab5 53921ba 278bab5 53921ba 0378e5d 53921ba 278bab5 53921ba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
license: mit
base_model: ZhangShenao/SELM-Zephyr-7B-iter-2
tags:
- alignment-handbook
- dpo
- trl
- selm
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: SELM-Zephyr-7B-iter-3
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
[Self-Exploring Language Models: Active Preference Elicitation for Online Alignment](https://arxiv.org/abs/2405.19332).
# SELM-Zephyr-7B-iter-3
This model is a fine-tuned version of [ZhangShenao/SELM-Zephyr-7B-iter-2](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-2) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
## Model description
- Model type: A 7B parameter Zephyr-based Self-Exploring Language Models (SELM).
- License: MIT
## Results
| | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
|----------------------------------------|------------------------|--------------------|
| [SELM-Zephyr-7B-iter-3](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-3) |        24.00 |       7.48 |
| [SELM-Zephyr-7B-iter-2](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-2) |        23.40 |       7.72 |
| [SELM-Zephyr-7B-iter-1](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-1) |        20.28 |       7.42 |
| [DPO-Zephyr-7B](https://huggingface.co/ZhangShenao/DPO-Zephyr-7B) |        14.45 |       7.28 |
Our model also ranks highly on [WildBench](https://huggingface.co/spaces/allenai/WildBench)! 🔥
### Training hyperparameters
The following hyperparameters were used during training:
- alpha: 0.001
- beta: 0.01
- train_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1
### Framework versions
- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1
|