jefson08's picture
End of training
4f7c0e8 verified
metadata
license: mit
base_model: microsoft/speecht5_vc
tags:
  - generated_from_trainer
datasets:
  - audiofolder
model-index:
  - name: SpeechT5_finetuned_kha
    results: []

SpeechT5_finetuned_kha

This model is a fine-tuned version of microsoft/speecht5_vc on the audiofolder dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4733

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 512
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 300
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.544 36.8664 1000 0.5145
0.5013 73.7327 2000 0.4800
0.4754 110.5991 3000 0.4705
0.4651 147.4654 4000 0.4710
0.456 184.3318 5000 0.4699
0.446 221.1982 6000 0.4702
0.443 258.0645 7000 0.4714
0.4437 294.9309 8000 0.4733

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.4.0
  • Datasets 3.0.1
  • Tokenizers 0.19.1