This is a distillation experiment with Qwen2-1.5B as teacher and Qwen2-0.5B as student model respectively.
Samples were taken from the Pile dataset.
optimizer: SM3, scheduler: cosine with warmup, lr=2e-5

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains distilled 0.5B Qwen2 language model.

Downloads last month: 112

Safetensors

Model size

494M params

Tensor type

BF16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train aloobun/d-Qwen2-0.5B

Collection including aloobun/d-Qwen2-0.5B

distilexp

Collection

some distillation experiments • 4 items • Updated Dec 9, 2024