新闻 | News
[2024-04-06] 开源puff系列模型,专门针对检索和语义匹配任务,更多的考虑泛化性和私有通用测试集效果,向量维度可变,中英双语。
[2024-02-27] 开源stella-mrl-large-zh-v3.5-1792d模型,支持向量可变维度。
[2024-02-17] 开源stella v3系列、dialogue编码模型和相关训练数据。
[2023-10-19] 开源stella-base-en-v2 使用简单,不需要任何前缀文本。
[2023-10-12] 开源stella-base-zh-v2和stella-large-zh-v2, 效果更好且使用简单,不需要任何前缀文本。
[2023-09-11] 开源stella-base-zh和stella-large-zh
欢迎去本人主页查看最新模型,并提出您的宝贵意见!
1 开源模型
本次开源stella-mrl-large-zh-v3.5-1792d模型, 本模型是在stella-large-zh-v3-1792d的基础上使用MRL方法训练而成。 其主要特点是可变的向量维度。
2 使用方法
from sentence_transformers import SentenceTransformer
from sklearn.preprocessing import normalize
model = SentenceTransformer("infgrad/stella-mrl-large-zh-v3.5-1792d")
# 注意先不要normalize! 选取前n维后再normalize
vectors = model.encode(["text1", "text2"], normalize_embeddings=False)
print(vectors.shape) # shape is [2,1792]
# n_dims越大效果越好,但是时空消耗就越大。建议维度选取128的倍数,因为是这么训练的
n_dims = 768
cut_vecs = normalize(vectors[:, :n_dims])
3 不同向量维度的CMTEB得分
stella-mrl-large-zh-v3.5-1792d_1024 代表取前1024维。整体趋势是维度越大效果越好。
Model | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | CMTEB-Score |
---|---|---|---|---|---|---|---|
stella-mrl-large-zh-v3.5-1792d_128 | 70.01 | 62.17 | 87.99 | 70.67 | 66.77 | 53.55 | 67.16 |
stella-mrl-large-zh-v3.5-1792d_256 | 72.19 | 62.41 | 88.09 | 71.22 | 68.32 | 53.38 | 68.02 |
stella-mrl-large-zh-v3.5-1792d_384 | 72.77 | 62.43 | 88.26 | 71.34 | 68.31 | 53.87 | 68.25 |
stella-mrl-large-zh-v3.5-1792d_512 | 73.11 | 62.45 | 88.16 | 71.46 | 68.32 | 53.28 | 68.29 |
stella-mrl-large-zh-v3.5-1792d_640 | 73.27 | 62.49 | 88.21 | 71.46 | 68.69 | 53.63 | 68.42 |
stella-mrl-large-zh-v3.5-1792d_768 | 73.38 | 62.5 | 88.19 | 71.49 | 68.64 | 53.77 | 68.47 |
stella-mrl-large-zh-v3.5-1792d_896 | 73.37 | 62.5 | 88.14 | 71.51 | 68.44 | 54.13 | 68.49 |
stella-mrl-large-zh-v3.5-1792d_1024 | 73.43 | 62.51 | 88.16 | 71.52 | 68.59 | 53.43 | 68.44 |
stella-mrl-large-zh-v3.5-1792d_1152 | 73.46 | 62.49 | 88.16 | 71.57 | 68.55 | 53.67 | 68.49 |
stella-mrl-large-zh-v3.5-1792d_1280 | 73.48 | 62.51 | 88.12 | 71.55 | 68.44 | 53.74 | 68.48 |
stella-mrl-large-zh-v3.5-1792d_1408 | 73.48 | 62.51 | 88.14 | 71.58 | 68.46 | 53.69 | 68.48 |
stella-mrl-large-zh-v3.5-1792d_1536 | 73.49 | 62.5 | 88.11 | 71.55 | 68.5 | 54.06 | 68.52 |
stella-mrl-large-zh-v3.5-1792d_1664 | 73.56 | 62.49 | 88.06 | 71.56 | 68.47 | 54.28 | 68.56 |
stella-mrl-large-zh-v3.5-1792d_1792 | 73.51 | 62.48 | 88.09 | 71.56 | 68.45 | 54.39 | 68.56 |
上述表格中stella-mrl-large-zh-v3.5-1792d_1792的得分为68.56和榜单68.55得分不一致,原因和权重类型有关,小差异请忽略不计。
- Downloads last month
- 12,101
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Spaces using dunzhang/stella-mrl-large-zh-v3.5-1792d 3
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported54.338
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported58.855
- euclidean_pearson on MTEB AFQMCvalidation set self-reported57.570
- euclidean_spearman on MTEB AFQMCvalidation set self-reported58.855
- manhattan_pearson on MTEB AFQMCvalidation set self-reported57.559
- manhattan_spearman on MTEB AFQMCvalidation set self-reported58.845
- cos_sim_pearson on MTEB ATECtest set self-reported54.220
- cos_sim_spearman on MTEB ATECtest set self-reported58.080
- euclidean_pearson on MTEB ATECtest set self-reported61.646
- euclidean_spearman on MTEB ATECtest set self-reported58.080