metadata
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- princeton-nlp/llama3-ultrafeedback
license: mit
a simpo-like DPO method, trained on simpo data AlpacaEval:44.8(+2)
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- princeton-nlp/llama3-ultrafeedback
license: mit
a simpo-like DPO method, trained on simpo data AlpacaEval:44.8(+2)