SpeechT5 TTS Igbo Yoruba

This model is a fine-tuned version of microsoft/speecht5_tts on the all_tts_v2_processed_with_speaker_embeddings dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4111

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 6
  • eval_batch_size: 6
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 12
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 18000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.6671 0.0526 250 0.5571
0.5767 0.1052 500 0.4814
0.5233 0.1577 750 0.4562
0.5045 0.2103 1000 0.4461
0.4917 0.2629 1250 0.4440
0.4908 0.3155 1500 0.4398
0.4881 0.3680 1750 0.4346
0.4855 0.4206 2000 0.4361
0.4785 0.4732 2250 0.4343
0.4753 0.5258 2500 0.4310
0.4767 0.5783 2750 0.4309
0.4707 0.6309 3000 0.4280
0.4724 0.6835 3250 0.4278
0.4694 0.7361 3500 0.4264
0.4674 0.7886 3750 0.4259
0.4659 0.8412 4000 0.4263
0.4631 0.8938 4250 0.4243
0.4644 0.9464 4500 0.4232
0.4619 0.9989 4750 0.4221
0.4662 1.0515 5000 0.4244
0.4602 1.1041 5250 0.4217
0.4616 1.1567 5500 0.4211
0.461 1.2093 5750 0.4201
0.4576 1.2618 6000 0.4212
0.4573 1.3144 6250 0.4187
0.4598 1.3670 6500 0.4186
0.4551 1.4196 6750 0.4200
0.4599 1.4721 7000 0.4175
0.4576 1.5247 7250 0.4169
0.4569 1.5773 7500 0.4180
0.4539 1.6299 7750 0.4175
0.4552 1.6824 8000 0.4158
0.4554 1.7350 8250 0.4163
0.451 1.7876 8500 0.4171
0.4558 1.8402 8750 0.4163
0.4539 1.8927 9000 0.4153
0.4537 1.9453 9250 0.4160
0.453 1.9979 9500 0.4164
0.4539 2.0505 9750 0.4157
0.4561 2.1030 10000 0.4143
0.4513 2.1556 10250 0.4144
0.4525 2.2082 10500 0.4145
0.4532 2.2608 10750 0.4149
0.4483 2.3134 11000 0.4140
0.4496 2.3659 11250 0.4142
0.4513 2.4185 11500 0.4131
0.4492 2.4711 11750 0.4134
0.4504 2.5237 12000 0.4130
0.4484 2.5762 12250 0.4131
0.4522 2.6288 12500 0.4132
0.4467 2.6814 12750 0.4124
0.4487 2.7340 13000 0.4125
0.4462 2.7865 13250 0.4117
0.4459 2.8391 13500 0.4119
0.4485 2.8917 13750 0.4121
0.4467 2.9443 14000 0.4121
0.4495 2.9968 14250 0.4124
0.4473 3.0494 14500 0.4111
0.4462 3.1020 14750 0.4112
0.445 3.1546 15000 0.4119
0.4497 3.2072 15250 0.4133
0.4488 3.2597 15500 0.4116
0.4451 3.3123 15750 0.4115
0.4473 3.3649 16000 0.4115
0.4416 3.4175 16250 0.4116
0.4454 3.4700 16500 0.4106
0.4491 3.5226 16750 0.4112
0.4502 3.5752 17000 0.4108
0.4488 3.6278 17250 0.4111
0.4474 3.6803 17500 0.4109
0.4478 3.7329 17750 0.4110
0.4468 3.7855 18000 0.4111

Framework versions

  • Transformers 4.48.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
144M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ccibeekeoc42/speecht5_finetuned_naija_ig_yo_2025-01-20_O2

Finetuned
(904)
this model