MERaLiON/Multitask-National-Speech-Corpus-v1
Viewer
•
Updated
•
15.2M
•
2.37k
•
3
datatrove
for all things web-scale data preparation: https://github.com/huggingface/datatrovenanotron
for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotronlighteval
for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval