YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
Qwen2-7B-SFT-Step-DPO - GGUF
- Model creator: https://huggingface.co/xinlai/
- Original model: https://huggingface.co/xinlai/Qwen2-7B-SFT-Step-DPO/
Original model description:
license: apache-2.0
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
๐ฅ๏ธCode | ๐คData | ๐Paper
This repo contains the Qwen2-7B-SFT-Step-DPO model. It is obtained by performing Step-DPO on Qwen2-7B-SFT.
Step-DPO is a simple, effective, and data-efficient method for boosting the mathematical reasoning ability of LLMs. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K without bells and wistles, respectively, surpassing a series of closed-source models, including GPT-4-1106, Claude-3-Opus, and Gemini-1.5-Pro.
Contact
- Downloads last month
- 13