answerdotai
/

ModernBERT-base

Model card Files Files and versions Community

Pretraining Using HF Tokenizers and Transformers

#36

by akhooli - opened 5 days ago

akhooli

5 days ago

I looked for an end to end example of pretraining a fresh ModernBERT model including the tokenizer (ex. a new language), or fine-tuning an existing checkpoint (ex. ModernBERT-Base) using a custom tokenizer (to account for a different vocabulary of another language family).
A HuggingFace implementation is preferred (saw this but current code is not working).

NohTow

about 15 hours ago

Hello,

The pre-training codebase should do the trick, it is its main purpose and is optimized. While it is using Composer, you should be able to leverage HF models and tokenizers.
For continued pre-training, someone reported having issue with loading the weights of ModernBERT, so we will investigate and potentially release Composer checkpoints alongside the HF ones when we release all the pre-training checkpoints (which, as stated in the issue, should be better starting points than the post-decay ones).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment