cortecs
/

DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic

compressed-tensors

Model card Files Files and versions Community

DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic / README.md

markoarnauto's picture

Upload README.md with huggingface_hub

fff3f17 verified 7 days ago

|

history blame contribute delete

3.02 kB

	---

	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

	---
	This is a quantization of the [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B).

	DeepSeek's Qwen-distilled models are compact reasoning models derived from DeepSeek-R1, achieving exceptional performance by distilling larger model reasoning patterns into smaller architectures. Spanning from 1.5B to 70B parameters, the models are based on Qwen2.5 and Llama3, with the standout DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini and setting new dense model benchmarks. By combining reinforcement learning (RL) and supervised fine-tuning (SFT), these open-source models provide a powerful resource for advancing research and practical applications.
	## Evaluations
	This model provides an accuracy recovery of 100.04%.

	\| __English__ \| __[DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)__ \| __[DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic (this)](https://huggingface.co/cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic)__ \|
	\|:--------------\|------------------------------------------------------------------------------------------------------:\|---------------------------------------------------------------------------------------------------------------------------------:\|
	\| Avg. \| 74.03 \| 74.06 \|
	\| ARC \| 68.2 \| 68.9 \|
	\| Hellaswag \| 74 \| 73.7 \|
	\| MMLU \| 79.88 \| 79.57 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) with `limit=1000`.

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic",
	"prompt": "San Francisco is a"
	} '
	```