zhou-xl's picture
Update README.md
8246343 verified
metadata
base_model:
  - meta-llama/Meta-Llama-3-8B-Instruct
datasets:
  - princeton-nlp/llama3-ultrafeedback
license: mit

a simpo-like DPO method, trained on simpo data AlpacaEval:44.8(+2)