base_model: | |
- meta-llama/Meta-Llama-3-8B-Instruct | |
datasets: | |
- princeton-nlp/llama3-ultrafeedback | |
license: mit | |
a simpo-like DPO method, trained on simpo data | |
AlpacaEval:44.8(+2) |
base_model: | |
- meta-llama/Meta-Llama-3-8B-Instruct | |
datasets: | |
- princeton-nlp/llama3-ultrafeedback | |
license: mit | |
a simpo-like DPO method, trained on simpo data | |
AlpacaEval:44.8(+2) |