SpeechT5 TTS Igbo Yoruba

This model is a fine-tuned version of microsoft/speecht5_tts on the all_tts_v2_processed_with_speaker_embeddings dataset. It achieves the following results on the evaluation set:

Loss: 0.4111

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 6
eval_batch_size: 6
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 12
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 18000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.6671	0.0526	250	0.5571
0.5767	0.1052	500	0.4814
0.5233	0.1577	750	0.4562
0.5045	0.2103	1000	0.4461
0.4917	0.2629	1250	0.4440
0.4908	0.3155	1500	0.4398
0.4881	0.3680	1750	0.4346
0.4855	0.4206	2000	0.4361
0.4785	0.4732	2250	0.4343
0.4753	0.5258	2500	0.4310
0.4767	0.5783	2750	0.4309
0.4707	0.6309	3000	0.4280
0.4724	0.6835	3250	0.4278
0.4694	0.7361	3500	0.4264
0.4674	0.7886	3750	0.4259
0.4659	0.8412	4000	0.4263
0.4631	0.8938	4250	0.4243
0.4644	0.9464	4500	0.4232
0.4619	0.9989	4750	0.4221
0.4662	1.0515	5000	0.4244
0.4602	1.1041	5250	0.4217
0.4616	1.1567	5500	0.4211
0.461	1.2093	5750	0.4201
0.4576	1.2618	6000	0.4212
0.4573	1.3144	6250	0.4187
0.4598	1.3670	6500	0.4186
0.4551	1.4196	6750	0.4200
0.4599	1.4721	7000	0.4175
0.4576	1.5247	7250	0.4169
0.4569	1.5773	7500	0.4180
0.4539	1.6299	7750	0.4175
0.4552	1.6824	8000	0.4158
0.4554	1.7350	8250	0.4163
0.451	1.7876	8500	0.4171
0.4558	1.8402	8750	0.4163
0.4539	1.8927	9000	0.4153
0.4537	1.9453	9250	0.4160
0.453	1.9979	9500	0.4164
0.4539	2.0505	9750	0.4157
0.4561	2.1030	10000	0.4143
0.4513	2.1556	10250	0.4144
0.4525	2.2082	10500	0.4145
0.4532	2.2608	10750	0.4149
0.4483	2.3134	11000	0.4140
0.4496	2.3659	11250	0.4142
0.4513	2.4185	11500	0.4131
0.4492	2.4711	11750	0.4134
0.4504	2.5237	12000	0.4130
0.4484	2.5762	12250	0.4131
0.4522	2.6288	12500	0.4132
0.4467	2.6814	12750	0.4124
0.4487	2.7340	13000	0.4125
0.4462	2.7865	13250	0.4117
0.4459	2.8391	13500	0.4119
0.4485	2.8917	13750	0.4121
0.4467	2.9443	14000	0.4121
0.4495	2.9968	14250	0.4124
0.4473	3.0494	14500	0.4111
0.4462	3.1020	14750	0.4112
0.445	3.1546	15000	0.4119
0.4497	3.2072	15250	0.4133
0.4488	3.2597	15500	0.4116
0.4451	3.3123	15750	0.4115
0.4473	3.3649	16000	0.4115
0.4416	3.4175	16250	0.4116
0.4454	3.4700	16500	0.4106
0.4491	3.5226	16750	0.4112
0.4502	3.5752	17000	0.4108
0.4488	3.6278	17250	0.4111
0.4474	3.6803	17500	0.4109
0.4478	3.7329	17750	0.4110
0.4468	3.7855	18000	0.4111

Framework versions

Transformers 4.48.1
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

ccibeekeoc42
/

speecht5_finetuned_naija_ig_yo_2025-01-20_O2

SpeechT5 TTS Igbo Yoruba

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ccibeekeoc42/speecht5_finetuned_naija_ig_yo_2025-01-20_O2

Evaluation results