Pre-trained checkpoints for speech representation in Japanese

The models in this repository were pre-trained via self-supervised learning (SSL) for speech representation. The SSL models were built on the fairseq toolkit.

  • wav2vec2_base_csj.pt
    • fairseq checkpoint of wav2vec2.0 model with Base architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
  • wav2vec2_base_csj_hf
    • converted version of wav2vec2_base_csj.pt compatible with the interface of Hugging Face by using this tool
  • hubert_base_csj.pt
    • fairseq checkpoint of HuBERT model with Base architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
  • hubert_base_csj_hf
    • converted version of hubert_base_csj.pt compatible with the interface of Hugging Face by using this tool

If you find this helpful, please consider citing the following paper.

@INPROCEEDINGS{ashihara_icassp23,
  author={Takanori Ashihara and Takafumi Moriya and Kohei Matsuura and Tomohiro Tanaka},
  title={Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .