I really want to expand this to contiguous masking of longer token sequences (e.g. [mask-5], [mask-8]). I have begun looking into how to write a custom DataCollator for this, but suspect I will also need to make some changes to the model as well.
Has anyone looked into this and can point me to any resources?
I’ve seen that SpanBERT models are on the hub, but we haven’t added the model itself yet to the library.
This would be a great project actually:
contribute SpanBERT to HuggingFace Transformers, based on the modeling file. This will be relatively easy, as the authors already used HuggingFace’s implementation of BERT and tweaked it a little bit. The only difference is this class. We could then call the model SpanBertModel in the library, and add a SpanBertForPreTraining similar to BertForPreTraining that includes the heads necessary for pre-training.
add a script to the examples directory, which could be called run_span_mlm.py (similar to run_mlm.py). This can be based on the files defined here (Facebook open-sourced everything!).
If anyone is interested in contributing, let me know!
@nielsr I am also interested in fine-tuning BERT (or any BERT like pre-trained model) using span masking. Can I know whether this is supported via transformers library. If so can you refer me to any resource available?