Pretraining Using HF Tokenizers and Transformers

#36
by akhooli - opened

I looked for an end to end example of pretraining a fresh ModernBERT model including the tokenizer (ex. a new language), or fine-tuning an existing checkpoint (ex. ModernBERT-Base) using a custom tokenizer (to account for a different vocabulary of another language family).
A HuggingFace implementation is preferred (saw this but current code is not working).

Hello,

The pre-training codebase should do the trick, it is its main purpose and is optimized. While it is using Composer, you should be able to leverage HF models and tokenizers.
For continued pre-training, someone reported having issue with loading the weights of ModernBERT, so we will investigate and potentially release Composer checkpoints alongside the HF ones when we release all the pre-training checkpoints (which, as stated in the issue, should be better starting points than the post-decay ones).

Sign up or log in to comment