Embeddings dimension
Hello,
I want first to thank you for the amazing project and for releasing the weights of the model.
I have a doubt
If I try to run the example code
from transformers import AutoModel
model = AutoModel.from_pretrained(
"jinaai/jina-embeddings-v2-base-en", trust_remote_code=True
) # trust_remote_code is needed to use the encode method
embeddings = model.encode(
["How is the weather today?", "What is the current weather like today?"]
)
print(embeddings.shape)
I get an output with shape (2, 768)
, while on the MTEB leaderboard the embeddings dimension
is 512.
Connected with this, is there a reason why the pooling layer size is different from the hidden size of the model?
Thanks in advance for the help
It's because your pooling config claims a word_embedding_dimension of 512. I'm not sure why, it doesn't appear to actually make the output 512, but it does seem to affect whatever automated system records model info.
Thanks for pointing that out
@ttronrud
:)
Should be fixed by:
https://huggingface.co/jinaai/jina-embeddings-v2-base-en/commit/7aef14b0840b7dded6c7e4ce28ff87f16071284d