Upload StyleTTS2 checkpoint epoch_2nd_00014.pth with all inference components

1f5676f verified 16 days ago

1.9 kB

	---
	language: en
	tags:
	- text-to-speech
	- StyleTTS2
	- speech-synthesis
	license: mit
	pipeline_tag: text-to-speech
	---

	# StyleTTS2 Fine-tuned Model

	This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference.

	## Model Details
	- Base Model: StyleTTS2-LibriTTS
	- Architecture: StyleTTS2
	- Task: Text-to-Speech
	- Last Checkpoint: epoch_2nd_00014.pth

	## Training Details
	- Total Epochs: 30
	- Completed Epochs: 14
	- Total Iterations: 1169
	- Batch Size: 2
	- Max Length: 120
	- Learning Rate: 0.0001
	- Final Validation Loss: 0.418901

	## Model Components
	The repository includes all necessary components for inference:

	### Main Model Components:
	- bert.pth
	- bert_encoder.pth
	- predictor.pth
	- decoder.pth
	- text_encoder.pth
	- predictor_encoder.pth
	- style_encoder.pth
	- diffusion.pth
	- text_aligner.pth
	- pitch_extractor.pth
	- mpd.pth
	- msd.pth
	- wd.pth

	### Utility Components:
	- ASR (Automatic Speech Recognition)
	- epoch_00080.pth
	- config.yml
	- models.py
	- layers.py
	- JDC (F0 Prediction)
	- bst.t7
	- model.py
	- PLBERT
	- step_1000000.t7
	- config.yml
	- util.py

	### Additional Files:
	- text_utils.py: Text preprocessing utilities
	- models.py: Model architecture definitions
	- utils.py: Utility functions
	- config.yml: Model configuration
	- config.json: Detailed configuration and training metrics

	## Training Metrics
	Training metrics visualization is available in training_metrics.png

	## Directory Structure
	├── Utils/
	│ ├── ASR/
	│ ├── JDC/
	│ └── PLBERT/
	├── model_components/
	└── configs/

	## Usage Instructions
	1. Load the model using the provided config.yml
	2. Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories
	3. Use text_utils.py for text preprocessing
	4. Follow the inference example in the StyleTTS2 documentation