Which loss function was used to fine tune this model? Euclidean distance, cosine similarity?
We used cosine similarity, normalizing the output of the model (last token pooling)
ยท Sign up or log in to comment