lapp0 commited on
Commit
99ac316
·
verified ·
1 Parent(s): 5570049

End of training

Browse files
README.md CHANGED
@@ -77,7 +77,7 @@ LlamaForCausalLM(
77
 
78
  # Resource Usage
79
 
80
- - Max Train VRAM Use: 13.1110 GB
81
  - Available VRAM: 23.4329 GB
82
  - GPUs:
83
  - 1x NVIDIA GeForce RTX 4090
@@ -114,9 +114,9 @@ LlamaForCausalLM(
114
  <br/>
115
 
116
  # Train Dataset
117
- Trained on 687,248,443 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
118
 
119
- - Num Samples: `1,996,000`
120
  - Subset: `20231101.en`
121
  - Split: `train`
122
 
@@ -145,12 +145,11 @@ The following hyperparameters were used during training:
145
  <summary>Expand</summary>
146
 
147
  - learning_rate: `0.0002`
148
- - train_batch_size: `16`
149
  - eval_batch_size: `2`
150
  - seed: `42`
151
  - optimizer: `Adam with betas=(0.9,0.999) and epsilon=1e-08`
152
  - lr_scheduler_type: `polynomial`
153
- - lr_scheduler_warmup_ratio: `0.1`
154
  - num_epochs: `1.0`
155
  - distillation_objective: `DistillationObjective(
156
  logits_loss_component=LossComponent(
@@ -164,7 +163,7 @@ The following hyperparameters were used during training:
164
  weight=0
165
  )
166
  )`
167
- - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x76ca0d527850>`
168
  - student_model_name_or_path: `None`
169
  - student_config_name_or_path: `None`
170
  - student_model_config: `{'num_hidden_layers': 15}`
@@ -179,8 +178,8 @@ The following hyperparameters were used during training:
179
  - dataset_subset: `20231101.en`
180
  - dataset_split: `train`
181
  - dataset_column_name: `text`
182
- - dataset_sample_size: `2000000`
183
- - dataset_max_seq_length: `512`
184
  - dataset_test_size: `0.002`
185
  - dataset_shuffle: `False`
186
  - dataset_shuffle_seed: `42`
@@ -188,7 +187,7 @@ The following hyperparameters were used during training:
188
  - gradient_accumulation_steps: `1`
189
  - weight_decay: `0.0`
190
  - max_grad_norm: `1.0`
191
- - warmup_ratio: `0.1`
192
  - warmup_steps: `0`
193
  - gradient_checkpointing: `True`
194
 
 
77
 
78
  # Resource Usage
79
 
80
+ - Max Train VRAM Use: 19.6182 GB
81
  - Available VRAM: 23.4329 GB
82
  - GPUs:
83
  - 1x NVIDIA GeForce RTX 4090
 
114
  <br/>
115
 
116
  # Train Dataset
117
+ Trained on 385,611,117 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
118
 
119
+ - Num Samples: `499,000`
120
  - Subset: `20231101.en`
121
  - Split: `train`
122
 
 
145
  <summary>Expand</summary>
146
 
147
  - learning_rate: `0.0002`
148
+ - train_batch_size: `4`
149
  - eval_batch_size: `2`
150
  - seed: `42`
151
  - optimizer: `Adam with betas=(0.9,0.999) and epsilon=1e-08`
152
  - lr_scheduler_type: `polynomial`
 
153
  - num_epochs: `1.0`
154
  - distillation_objective: `DistillationObjective(
155
  logits_loss_component=LossComponent(
 
163
  weight=0
164
  )
165
  )`
166
+ - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x76c84721bdc0>`
167
  - student_model_name_or_path: `None`
168
  - student_config_name_or_path: `None`
169
  - student_model_config: `{'num_hidden_layers': 15}`
 
178
  - dataset_subset: `20231101.en`
179
  - dataset_split: `train`
180
  - dataset_column_name: `text`
181
+ - dataset_sample_size: `500000`
182
+ - dataset_max_seq_length: `2048`
183
  - dataset_test_size: `0.002`
184
  - dataset_shuffle: `False`
185
  - dataset_shuffle_seed: `42`
 
187
  - gradient_accumulation_steps: `1`
188
  - weight_decay: `0.0`
189
  - max_grad_norm: `1.0`
190
+ - warmup_ratio: `0.0`
191
  - warmup_steps: `0`
192
  - gradient_checkpointing: `True`
193
 
logs/dataset_max_seq_length=2048, dataset_sample_size=500000, per_device_train_batch_size=4/events.out.tfevents.1726453475.1c1a426a2fee ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65b94c02ab5a68b91f35d0c71ac0136bf01049efc17ec6e83ab5f5d2c03cdbdc
3
+ size 529