Max coc token length
Hi,
Thank you for releasing the model! I am just wondering what max_doc size should I use for this splade v3 model? 256 like previous ones?
Thanks again,
Yuchen
Hi Yuchen,
In practice, you can use the BERT's max length (so, 512). Models have been trained w/ lower values (128 or 256), but it should still work the same if you increase (or decrease) this value at inference.
Hope it helps
Thibault
Hi Thibault,
Thank you for the answer and have a nice day!
Best,
Yuchen
Oh by the way, I have another question.
Now I have built a clueweb22B index using the github code. And I found that https://github.com/naver/splade/blob/main/splade/indexing/inverted_index.py#L32 this line takes a long while to load the index into the memory, which makes the whole splade retrieval process about 20 minutes.
I am thinking that as a sparse retrieval model, there should be some way to accelerate the search process. Do you know any of this type of efforts in the community?
Thanks again,
Yuchen
Hi Yuchen,
You can have a look here: https://github.com/TusKANNy/seismic
They have super fast retrieval algorithms for neural sparse retrievers!
Thibault