lapp0's picture
End of training
e3e9572 verified
|
raw
history blame
3.31 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.8
    results: []

distily_bench_obj_cross_v2.8

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 198.4185
  • eval_frwikippl: 100815.4219
  • eval_zhwikippl: 1470416.875
  • eval_tinystoriesppl: 10.3978
  • eval_loss: 1.2095
  • eval_runtime: 6.4972
  • eval_samples_per_second: 76.957
  • eval_steps_per_second: 9.697

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.004
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6047 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 11624.0693 60523.7891 5.8780 6.5064 76.847 9.683 4963.9985 59716.3516
5000 0.1010 198.4185 100815.4219 1.2095 6.4972 76.957 9.697 10.3978 1470416.875
10000 0.2020 197.7357 97196.9141 1.2093 6.4818 77.139 9.72 10.2892 1447451.75
15000 0.3030 197.2080 98104.7109 1.2096 6.504 76.876 9.686 10.4405 1471986.875
20000 0.4040 195.2773 97005.4062 1.2094 6.4752 77.218 9.729 10.1995 1422946.875
25000 0.5051 196.9104 100058.5391 1.2099 6.4866 77.082 9.712 10.2727 1497336.25
30000 0.6061 195.8986 94337.375 1.2099 6.4887 77.057 9.709 10.3785 1435147.0
35000 0.7071 196.9180 96216.1797 1.2091 6.5251 76.627 9.655 10.3306 1444365.625
40000 0.8081 197.7434 97635.9688 1.2091 6.5177 76.715 9.666 10.3669 1472771.75
45000 0.9091 198.4416 98575.6953 1.2093 6.4898 77.044 9.708 10.3216 1521499.0
49500 1.0 197.6898 97801.2031 1.2091 6.4817 77.141 9.72 10.3293 1486192.625

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0