notHumpback-Myx

This model follows the Humpback architecture, proposed in the paper Self-Alignment with Instruction Backtranslation by Li et al.

It represents the "backward model", which is used to generate instructions from web texts. These are considered as possible model outputs.

Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation), creating a richer dataset for fine-tuning models without the need for additional manual annotation. The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset of all pairs with the highest possible score (self-curation).

Varying from the original paper, this model is a fine-tuned version of meta-llama/Llama-3.2-3B. It has been trained using TRL.

The dataset used to train this model has been sampled from the oasst1 dataset. In order to achieve the "backward" structure, the model is trained on output-input pairs.

Framework versions

TRL: 0.12.1
Transformers: 4.46.3
Pytorch: 2.5.1
Datasets: 3.1.0
Tokenizers: 0.20.3

Citations

Original paper:

@misc{li2023selfalignment,
    title={Self-Alignment with Instruction Backtranslation},
    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
    year={2023},
    eprint={2308.06259},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Alepach
/

notHumpback-Myx

notHumpback-Myx

Framework versions

Citations

Model tree for Alepach/notHumpback-Myx

Dataset used to train Alepach/notHumpback-Myx