Use Chinese and English STS and NLI corpora to conduct contrastive learning finetuning on xlmr
Using HuggingFace Transformers
from transformers import AutoTokenizer, AutoModel
import torch
# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('zhou-xl/bi-cse')
model = AutoModel.from_pretrained('zhou-xl/bi-cse')
model.eval()
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
print("Sentence embeddings:", sentence_embeddings)
- Downloads last month
- 283
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Spaces using zhou-xl/bi-cse 5
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported42.010
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported43.449
- euclidean_pearson on MTEB AFQMCvalidation set self-reported41.933
- euclidean_spearman on MTEB AFQMCvalidation set self-reported43.457
- manhattan_pearson on MTEB AFQMCvalidation set self-reported41.930
- manhattan_spearman on MTEB AFQMCvalidation set self-reported43.445
- cos_sim_pearson on MTEB ATECtest set self-reported47.484
- cos_sim_spearman on MTEB ATECtest set self-reported48.010
- cos_sim_pearson on MTEB BIOSSEStest set self-reported70.066
- cos_sim_spearman on MTEB BIOSSEStest set self-reported70.564