---

base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

---
This is a quantization of the [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B).

DeepSeek's Qwen-distilled models are compact reasoning models derived from DeepSeek-R1, achieving exceptional performance by distilling larger model reasoning patterns into smaller architectures. Spanning from 1.5B to 70B parameters, the models are based on Qwen2.5 and Llama3, with the standout DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini and setting new dense model benchmarks. By combining reinforcement learning (RL) and supervised fine-tuning (SFT), these open-source models provide a powerful resource for advancing research and practical applications.
## Evaluations
This model provides an accuracy recovery of 100.04%. 

| __English__   |   __[DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)__ |   __[DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic (this)](https://huggingface.co/cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic)__ |
|:--------------|------------------------------------------------------------------------------------------------------:|---------------------------------------------------------------------------------------------------------------------------------:|
| Avg.          |                                                                                                 74.03 |                                                                                                                            74.06 |
| ARC           |                                                                                                 68.2  |                                                                                                                            68.9  |
| Hellaswag     |                                                                                                 74    |                                                                                                                            73.7  |
| MMLU          |                                                                                                 79.88 |                                                                                                                            79.57 |

We did not check for data contamination.
     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) with `limit=1000`. 
    
## Usage
Install **vLLM** and 
    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
    
```
python -m vllm.entrypoints.openai.api_server --model cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic
```
Access the model:
```
curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '
```