distily_bench_obj_cross_v2.7

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 3887.9170
  • eval_frwikippl: 50974.8398
  • eval_zhwikippl: 83822.7812
  • eval_tinystoriesppl: 1011.5359
  • eval_loss: 4.8822
  • eval_runtime: 6.5175
  • eval_samples_per_second: 76.716
  • eval_steps_per_second: 9.666

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6058 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 23232.2363 111004.0469 6.4068 6.5373 76.484 9.637 9550.5166 102446.0156
500 0.0101 3972.8616 51053.8984 4.8833 6.5063 76.848 9.683 1038.9088 84361.2188
1000 0.0202 3908.7549 50974.8398 4.8822 6.5275 76.599 9.651 1016.2294 83867.4844
1500 0.0303 3887.6167 50974.8398 4.8822 6.5158 76.737 9.669 1010.1989 83867.4844
2000 0.0404 3873.1880 50989.1836 4.8822 6.5149 76.747 9.67 1005.8657 83822.7812
2500 0.0505 3897.8667 50974.8398 4.8822 6.5124 76.777 9.674 1014.4666 83822.7812
3000 0.0606 3906.9363 50974.8398 4.8822 6.5204 76.682 9.662 1016.0612 83822.7812
3500 0.0707 3893.6423 50974.8398 4.8822 6.5097 76.809 9.678 1012.0377 83822.7812
4000 0.0808 3894.2476 50974.8398 4.8822 6.511 76.793 9.676 1013.2932 83822.7812
4500 0.0909 3917.8462 51010.7305 4.8822 6.5167 76.726 9.668 1019.4268 83867.4844
5000 0.1010 3912.3875 51003.5820 4.8822 6.5075 76.834 9.681 1017.7424 83867.4844
5500 0.1111 3903.9119 50974.8398 4.8822 6.5226 76.656 9.659 1015.6412 83822.7812
6000 0.1212 3912.3875 51003.5820 4.8822 6.5212 76.673 9.661 1017.4061 83867.4844
6500 0.1313 3936.0994 51068.2656 4.8822 6.4984 76.942 9.695 1025.1731 83957.0312
7000 0.1414 3922.1003 51053.8984 4.8822 6.5149 76.747 9.67 1020.9445 83912.2812
7500 0.1515 3914.2085 51010.7305 4.8822 6.5141 76.757 9.671 1018.9213 83867.4844
8000 0.1616 3907.5435 50974.8398 4.8822 6.5148 76.748 9.67 1016.2294 83867.4844
8500 0.1717 3913.6001 51003.5820 4.8822 6.5271 76.603 9.652 1018.0792 83867.4844
9000 0.1818 3897.8667 50974.8398 4.8822 6.5537 76.293 9.613 1014.1313 83822.7812
9500 0.1919 3887.9170 50974.8398 4.8822 6.5175 76.716 9.666 1011.5359 83822.7812
10000 0.2020 3882.1963 50960.4531 4.8822 6.5134 76.765 9.672 1008.7803 83822.7812
10500 0.2121 3912.3875 51003.5820 4.8822 6.5102 76.802 9.677 1017.5745 83867.4844
11000 0.2222 3938.5400 51068.2656 4.8822 6.4952 76.98 9.699 1025.5123 83957.0312
11500 0.2323 3939.7610 51068.2656 4.8822 6.4945 76.988 9.7 1028.0588 84136.4141
12000 0.2424 3951.9873 51097.0547 4.8822 6.5097 76.809 9.678 1029.9302 84136.4141
12500 0.2525 3922.1003 51053.8984 4.8822 6.5036 76.881 9.687 1021.7891 83912.2812
13000 0.2626 3897.8667 50974.8398 4.8822 6.5248 76.631 9.655 1014.6344 83822.7812
13500 0.2727 3887.6167 50974.8398 4.8822 6.5036 76.881 9.687 1010.3655 83822.7812
14000 0.2828 3867.7913 50989.1836 4.8822 6.5118 76.783 9.675 1001.6337 83778.0312
14500 0.2929 3860.6052 50989.1836 4.8822 6.5064 76.847 9.683 999.4833 83778.0312
15000 0.3030 3873.1880 50960.4531 4.8822 6.5104 76.8 9.677 1005.8657 83822.7812
15500 0.3131 3884.0034 50960.4531 4.8822 6.5186 76.704 9.665 1009.1969 83822.7812
16000 0.3232 3876.7874 50989.1836 4.8822 6.5138 76.76 9.672 1006.6141 83822.7812
16500 0.3333 3902.7021 50974.8398 4.8822 6.5538 76.291 9.613 1015.1377 83822.7812
17000 0.3434 3894.2476 50974.8398 4.8822 6.5508 76.326 9.617 1013.2932 83822.7812
17500 0.3535 3895.4548 50974.8398 4.8822 6.5145 76.752 9.671 1013.4609 83822.7812
18000 0.3636 3892.4358 50974.8398 4.8822 6.5208 76.678 9.661 1011.7029 83822.7812
18500 0.3737 3895.4548 50974.8398 4.8822 6.506 76.852 9.683 1013.4609 83822.7812
19000 0.3838 3922.1003 51053.8984 4.8822 6.4973 76.955 9.696 1021.7891 83912.2812
19500 0.3939 3931.2227 51068.2656 4.8822 6.4889 77.055 9.709 1023.6490 83957.0312
20000 0.4040 3926.9622 51068.2656 4.8822 6.4907 77.034 9.706 1023.3103 83912.2812
20500 0.4141 3922.1003 51053.8984 4.8822 6.4861 77.088 9.713 1021.7891 83912.2812
21000 0.4242 3922.1003 51053.8984 4.8822 6.4983 76.943 9.695 1021.7891 83912.2812
21500 0.4343 3931.2227 51068.2656 4.8822 6.509 76.817 9.679 1024.8344 83957.0312
22000 0.4444 3938.5400 51068.2656 4.8822 6.5034 76.883 9.687 1026.6998 84091.5703
22500 0.4545 3931.2227 51068.2656 4.8822 6.5235 76.646 9.657 1024.8344 83957.0312
23000 0.4646 3922.1003 51053.8984 4.8822 6.5476 76.363 9.622 1021.4509 83867.4844
23500 0.4747 3920.8850 51053.8984 4.8822 6.4853 77.098 9.714 1020.6072 83867.4844
24000 0.4848 3919.6699 51053.8984 4.8822 6.4956 76.975 9.699 1020.4384 83867.4844
24500 0.4949 3921.4907 51053.8984 4.8822 6.5036 76.88 9.687 1020.7761 83867.4844
25000 0.5051 3919.6699 51053.8984 4.8822 6.4891 77.052 9.709 1020.6072 83867.4844
25500 0.5152 3922.1003 51053.8984 4.8822 6.4927 77.01 9.703 1020.9445 83867.4844
26000 0.5253 3926.9622 51068.2656 4.8822 6.4956 76.975 9.699 1022.4650 83912.2812
26500 0.5354 3931.2227 51068.2656 4.8822 6.5008 76.914 9.691 1024.1567 83957.0312
27000 0.5455 3926.9622 51068.2656 4.8822 6.5048 76.867 9.685 1022.2959 83912.2812
27500 0.5556 3922.1003 51053.8984 4.8822 6.526 76.617 9.654 1021.7891 83912.2812
28000 0.5657 3921.4907 51053.8984 4.8822 6.5337 76.527 9.642 1020.7761 83867.4844
28500 0.5758 3917.8462 51053.8984 4.8822 6.4962 76.969 9.698 1019.5950 83867.4844
29000 0.5859 3917.8462 51039.4883 4.8822 6.4943 76.99 9.701 1019.4268 83867.4844
29500 0.5960 3919.6699 51053.8984 4.8822 6.5069 76.841 9.682 1020.4384 83867.4844
30000 0.6061 3919.6699 51053.8984 4.8822 6.5033 76.884 9.687 1020.4384 83867.4844
30500 0.6162 3917.8462 51039.4883 4.8822 6.4958 76.972 9.699 1019.4268 83867.4844
31000 0.6263 3917.8462 51039.4883 4.8822 6.4827 77.129 9.718 1019.4268 83867.4844
31500 0.6364 3917.8462 51039.4883 4.8822 6.495 76.982 9.7 1019.4268 83867.4844
32000 0.6465 3918.4551 51053.8984 4.8822 6.5193 76.696 9.664 1019.5950 83867.4844
32500 0.6566 3921.4907 51053.8984 4.8822 6.5213 76.671 9.661 1020.7761 83867.4844
33000 0.6667 3922.1003 51053.8984 4.8822 6.5 76.923 9.692 1021.4509 83867.4844
33500 0.6768 3922.1003 51053.8984 4.8822 6.5038 76.878 9.687 1021.7891 83912.2812
34000 0.6869 3922.1003 51053.8984 4.8822 6.5965 75.798 9.551 1021.6200 83867.4844
34500 0.6970 3922.1003 51053.8984 4.8822 6.4926 77.01 9.703 1020.7761 83867.4844
35000 0.7071 3921.4907 51053.8984 4.8822 6.5061 76.851 9.683 1020.7761 83867.4844
35500 0.7172 3921.4907 51053.8984 4.8822 6.5289 76.582 9.649 1020.7761 83867.4844
36000 0.7273 3922.1003 51053.8984 4.8822 6.5582 76.24 9.606 1021.4509 83867.4844
36500 0.7374 3922.1003 51053.8984 4.8822 6.5354 76.506 9.64 1021.7891 83912.2812
37000 0.7475 3924.5286 51053.8984 4.8822 6.5215 76.669 9.66 1021.7891 83912.2812
37500 0.7576 3926.9622 51053.8984 4.8822 6.5007 76.915 9.691 1021.7891 83912.2812
38000 0.7677 3924.5286 51053.8984 4.8822 6.5068 76.842 9.682 1021.7891 83912.2812
38500 0.7778 3922.1003 51053.8984 4.8822 6.5229 76.653 9.658 1021.6200 83867.4844
39000 0.7879 3922.1003 51053.8984 4.8822 6.5165 76.728 9.668 1020.9445 83867.4844
39500 0.7980 3920.8850 51053.8984 4.8822 6.5119 76.782 9.675 1020.6072 83867.4844
40000 0.8081 3920.8850 51053.8984 4.8822 6.5092 76.814 9.679 1020.6072 83867.4844
40500 0.8182 3920.8850 51053.8984 4.8822 6.5191 76.697 9.664 1020.7761 83867.4844
41000 0.8283 3920.8850 51053.8984 4.8822 6.5316 76.551 9.645 1020.7761 83867.4844
41500 0.8384 3920.8850 51053.8984 4.8822 6.5046 76.869 9.685 1020.7761 83867.4844
42000 0.8485 3920.8850 51053.8984 4.8822 6.5038 76.878 9.687 1020.6072 83867.4844
42500 0.8586 3920.8850 51053.8984 4.8822 6.5215 76.669 9.66 1020.6072 83867.4844
43000 0.8687 3920.8850 51053.8984 4.8822 6.5049 76.865 9.685 1020.6072 83867.4844
43500 0.8788 3920.8850 51053.8984 4.8822 6.5074 76.836 9.681 1020.6072 83867.4844
44000 0.8889 3920.8850 51053.8984 4.8822 6.4973 76.956 9.696 1020.6072 83867.4844
44500 0.8990 3920.8850 51053.8984 4.8822 6.529 76.581 9.649 1020.6072 83867.4844
45000 0.9091 3920.8850 51053.8984 4.8822 6.5231 76.651 9.658 1020.6072 83867.4844
45500 0.9192 3920.8850 51053.8984 4.8822 6.5386 76.469 9.635 1020.6072 83867.4844
46000 0.9293 3920.8850 51053.8984 4.8822 6.5266 76.61 9.653 1020.6072 83867.4844
46500 0.9394 3920.8850 51053.8984 4.8822 6.4999 76.924 9.692 1020.6072 83867.4844
47000 0.9495 3920.8850 51053.8984 4.8822 6.5248 76.63 9.655 1020.6072 83867.4844
47500 0.9596 3920.8850 51053.8984 4.8822 6.5217 76.668 9.66 1020.6072 83867.4844
48000 0.9697 3920.8850 51053.8984 4.8822 6.5043 76.872 9.686 1020.6072 83867.4844
48500 0.9798 3920.8850 51053.8984 4.8822 6.4865 77.084 9.713 1020.6072 83867.4844
49000 0.9899 3920.8850 51053.8984 4.8822 6.5203 76.684 9.662 1020.6072 83867.4844
49500 1.0 3920.8850 51053.8984 4.8822 6.6387 75.315 9.49 1020.6072 83867.4844

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
1
Safetensors
Model size
68.5M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.7

Quantized
(10)
this model