--- license: apache-2.0 base_model: openai/whisper-large-v3 tags: - generated_from_trainer metrics: - wer model-index: - name: Hibiki_ASR_Phonemizer results: [] language: - ja --- # Hibiki ASR Phonemizer This model is a Phoneme Level Speech Recognition network, originally a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on a mixture of Different Japanese datasets. it can detect, transcribe and do the following: - non-speech sounds such as gasp, erotic moans, laughter, etc. - adding punctuations more faithfully. Don't use this model without the post processing functions I wrote below, or you'll get less than ideal performance. check the notebook. __________________________________ to reverse the process and get the graphemes, use [this model](https://huggingface.co/Respair/Japanese_Phoneme_to_Grapheme_LLM). __________________________________ ## How to use Check here -> [Notebook](https://colab.research.google.com/drive/13tx8WKzkvePFdtKU4WUE_iYyYCqTY8dZ#scrollTo=5XqUs-sPdT79) ## Intended uses & limitations No restrictions is imposed by me, but proceed at your own risk, The User (You) are entirely responisble for their actions. ## Training and evaluation data - [Japanese Common Voice 17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) - [ehehe Corpus](https://huggingface.co/datasets/litagin/ehehe-corpus) - Manually cleaned and annotated custom Game and Anime dataset (around 8 hours) ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 24 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - training_steps: 5000 ### Compute and Duration - 1x A100(40G) - 64gb RAM - BF16 - 14hrs ### Framework versions - Transformers 4.41.1 - Pytorch 2.4.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1