dataset Intruction


datasets:
- mxz/CValues_DPO
language:
- zh
- en
metrics:
- perplexity
pipeline_tag:
- text-generation
tags:
- DPO
- fintune
- alignment
- LoRA
- Llama-3

About mxz-llama-3-8B-sft

This model trained by SFT and PPO.

It's have coding, reasoing, chinese QA .

evaluation

Result:

Model MMLU C-EVAL C-MMLU
Llama-3-8B 55.5 47.0 48.0
Llama-3-8B-Instruct 60.1 49.7 49.3
Llama-3-8B-dpo 62.2 49.9 49.4
Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.