zhou-xl's picture
Update README.md
8246343 verified
---
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- princeton-nlp/llama3-ultrafeedback
license: mit
---
a simpo-like DPO method, trained on simpo data
AlpacaEval:44.8(+2)