SentenceTransformer based on cointegrated/LaBSE-en-ru

This is a sentence-transformers model finetuned from cointegrated/LaBSE-en-ru. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: cointegrated/LaBSE-en-ru
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("whitemouse84/LaBSE-en-ru-distilled-each-third-layer")
# Run inference
sentences = [
    'See Name section.',
    'Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.',
    'Yeah, people who might not be hungry.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.5305
spearman_cosine 0.6347
pearson_manhattan 0.5553
spearman_manhattan 0.6389
pearson_euclidean 0.55
spearman_euclidean 0.6347
pearson_dot 0.5305
spearman_dot 0.6347
pearson_max 0.5553
spearman_max 0.6389

Knowledge Distillation

Metric Value
negative_mse -0.0063

Semantic Similarity

Metric Value
pearson_cosine 0.5043
spearman_cosine 0.5986
pearson_manhattan 0.5227
spearman_manhattan 0.5984
pearson_euclidean 0.5227
spearman_euclidean 0.5986
pearson_dot 0.5043
spearman_dot 0.5986
pearson_max 0.5227
spearman_max 0.5986

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,975,066 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 6 tokens
    • mean: 26.93 tokens
    • max: 139 tokens
    • size: 768 elements
  • Samples:
    sentence label
    It is based on the Java Persistence API (JPA), but it does not strictly follow the JSR 338 Specification, as it implements different design patterns and technologies. [-0.012331949546933174, -0.04570527374744415, -0.024963658303022385, -0.03620213270187378, 0.022556383162736893, ...]
    Покупаем вторичное сырье в Каунасе (Переработка вторичного сырья) - Алфенас АНД КО, ЗАО на Bizorg. [-0.07498518377542496, -0.01913534104824066, -0.01797042042016983, 0.048263177275657654, -0.00016611881437711418, ...]
    At the Equal Justice Conference ( EJC ) held in March 2001 in San Diego , LSC and the Project for the Future of Equal Justice held the second Case Management Software pre-conference . [0.03870972990989685, -0.0638347640633583, -0.01696585863828659, -0.043612319976091385, -0.048241738229990005, ...]
  • Loss: MSELoss

Evaluation Dataset

Unnamed Dataset

  • Size: 10,000 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 5 tokens
    • mean: 24.18 tokens
    • max: 111 tokens
    • size: 768 elements
  • Samples:
    sentence label
    The Canadian Canoe Museum is a museum dedicated to canoes located in Peterborough, Ontario, Canada. [-0.05444105342030525, -0.03650881350040436, -0.041163671761751175, -0.010616903193295002, -0.04094529151916504, ...]
    И мне нравилось, что я одновременно зарабатываю и смотрю бои». [-0.03404555842280388, 0.028203096240758896, -0.056121889501810074, -0.0591997392475605, -0.05523117259144783, ...]
    Ну, а на следующий день, разумеется, Президент Кеннеди объявил блокаду Кубы, и наши корабли остановили у кубинских берегов направлявшийся на Кубу российский корабль, и у него на борту нашли ракеты. [-0.008193841204047203, 0.00694894278421998, -0.03027420863509178, -0.03290146216750145, 0.01425305474549532, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0001
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss negative_mse sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - - -0.2381 0.4206 -
0.0058 1000 0.0014 - - - -
0.0117 2000 0.0009 - - - -
0.0175 3000 0.0007 - - - -
0.0233 4000 0.0006 - - - -
0.0292 5000 0.0005 0.0004 -0.0363 0.6393 -
0.0350 6000 0.0004 - - - -
0.0408 7000 0.0004 - - - -
0.0467 8000 0.0003 - - - -
0.0525 9000 0.0003 - - - -
0.0583 10000 0.0003 0.0002 -0.0207 0.6350 -
0.0641 11000 0.0003 - - - -
0.0700 12000 0.0003 - - - -
0.0758 13000 0.0002 - - - -
0.0816 14000 0.0002 - - - -
0.0875 15000 0.0002 0.0002 -0.0157 0.6328 -
0.0933 16000 0.0002 - - - -
0.0991 17000 0.0002 - - - -
0.1050 18000 0.0002 - - - -
0.1108 19000 0.0002 - - - -
0.1166 20000 0.0002 0.0001 -0.0132 0.6317 -
0.1225 21000 0.0002 - - - -
0.1283 22000 0.0002 - - - -
0.1341 23000 0.0002 - - - -
0.1400 24000 0.0002 - - - -
0.1458 25000 0.0002 0.0001 -0.0118 0.6251 -
0.1516 26000 0.0002 - - - -
0.1574 27000 0.0002 - - - -
0.1633 28000 0.0002 - - - -
0.1691 29000 0.0002 - - - -
0.1749 30000 0.0002 0.0001 -0.0109 0.6304 -
0.1808 31000 0.0002 - - - -
0.1866 32000 0.0002 - - - -
0.1924 33000 0.0002 - - - -
0.1983 34000 0.0001 - - - -
0.2041 35000 0.0001 0.0001 -0.0102 0.6280 -
0.2099 36000 0.0001 - - - -
0.2158 37000 0.0001 - - - -
0.2216 38000 0.0001 - - - -
0.2274 39000 0.0001 - - - -
0.2333 40000 0.0001 0.0001 -0.0098 0.6272 -
0.2391 41000 0.0001 - - - -
0.2449 42000 0.0001 - - - -
0.2507 43000 0.0001 - - - -
0.2566 44000 0.0001 - - - -
0.2624 45000 0.0001 0.0001 -0.0093 0.6378 -
0.2682 46000 0.0001 - - - -
0.2741 47000 0.0001 - - - -
0.2799 48000 0.0001 - - - -
0.2857 49000 0.0001 - - - -
0.2916 50000 0.0001 0.0001 -0.0089 0.6325 -
0.2974 51000 0.0001 - - - -
0.3032 52000 0.0001 - - - -
0.3091 53000 0.0001 - - - -
0.3149 54000 0.0001 - - - -
0.3207 55000 0.0001 0.0001 -0.0087 0.6328 -
0.3266 56000 0.0001 - - - -
0.3324 57000 0.0001 - - - -
0.3382 58000 0.0001 - - - -
0.3441 59000 0.0001 - - - -
0.3499 60000 0.0001 0.0001 -0.0085 0.6357 -
0.3557 61000 0.0001 - - - -
0.3615 62000 0.0001 - - - -
0.3674 63000 0.0001 - - - -
0.3732 64000 0.0001 - - - -
0.3790 65000 0.0001 0.0001 -0.0083 0.6366 -
0.3849 66000 0.0001 - - - -
0.3907 67000 0.0001 - - - -
0.3965 68000 0.0001 - - - -
0.4024 69000 0.0001 - - - -
0.4082 70000 0.0001 0.0001 -0.0080 0.6325 -
0.4140 71000 0.0001 - - - -
0.4199 72000 0.0001 - - - -
0.4257 73000 0.0001 - - - -
0.4315 74000 0.0001 - - - -
0.4374 75000 0.0001 0.0001 -0.0078 0.6351 -
0.4432 76000 0.0001 - - - -
0.4490 77000 0.0001 - - - -
0.4548 78000 0.0001 - - - -
0.4607 79000 0.0001 - - - -
0.4665 80000 0.0001 0.0001 -0.0077 0.6323 -
0.4723 81000 0.0001 - - - -
0.4782 82000 0.0001 - - - -
0.4840 83000 0.0001 - - - -
0.4898 84000 0.0001 - - - -
0.4957 85000 0.0001 0.0001 -0.0076 0.6316 -
0.5015 86000 0.0001 - - - -
0.5073 87000 0.0001 - - - -
0.5132 88000 0.0001 - - - -
0.5190 89000 0.0001 - - - -
0.5248 90000 0.0001 0.0001 -0.0074 0.6306 -
0.5307 91000 0.0001 - - - -
0.5365 92000 0.0001 - - - -
0.5423 93000 0.0001 - - - -
0.5481 94000 0.0001 - - - -
0.5540 95000 0.0001 0.0001 -0.0073 0.6305 -
0.5598 96000 0.0001 - - - -
0.5656 97000 0.0001 - - - -
0.5715 98000 0.0001 - - - -
0.5773 99000 0.0001 - - - -
0.5831 100000 0.0001 0.0001 -0.0072 0.6333 -
0.5890 101000 0.0001 - - - -
0.5948 102000 0.0001 - - - -
0.6006 103000 0.0001 - - - -
0.6065 104000 0.0001 - - - -
0.6123 105000 0.0001 0.0001 -0.0071 0.6351 -
0.6181 106000 0.0001 - - - -
0.6240 107000 0.0001 - - - -
0.6298 108000 0.0001 - - - -
0.6356 109000 0.0001 - - - -
0.6415 110000 0.0001 0.0001 -0.0070 0.6330 -
0.6473 111000 0.0001 - - - -
0.6531 112000 0.0001 - - - -
0.6589 113000 0.0001 - - - -
0.6648 114000 0.0001 - - - -
0.6706 115000 0.0001 0.0001 -0.0070 0.6336 -
0.6764 116000 0.0001 - - - -
0.6823 117000 0.0001 - - - -
0.6881 118000 0.0001 - - - -
0.6939 119000 0.0001 - - - -
0.6998 120000 0.0001 0.0001 -0.0069 0.6305 -
0.7056 121000 0.0001 - - - -
0.7114 122000 0.0001 - - - -
0.7173 123000 0.0001 - - - -
0.7231 124000 0.0001 - - - -
0.7289 125000 0.0001 0.0001 -0.0068 0.6362 -
0.7348 126000 0.0001 - - - -
0.7406 127000 0.0001 - - - -
0.7464 128000 0.0001 - - - -
0.7522 129000 0.0001 - - - -
0.7581 130000 0.0001 0.0001 -0.0067 0.6340 -
0.7639 131000 0.0001 - - - -
0.7697 132000 0.0001 - - - -
0.7756 133000 0.0001 - - - -
0.7814 134000 0.0001 - - - -
0.7872 135000 0.0001 0.0001 -0.0067 0.6365 -
0.7931 136000 0.0001 - - - -
0.7989 137000 0.0001 - - - -
0.8047 138000 0.0001 - - - -
0.8106 139000 0.0001 - - - -
0.8164 140000 0.0001 0.0001 -0.0066 0.6339 -
0.8222 141000 0.0001 - - - -
0.8281 142000 0.0001 - - - -
0.8339 143000 0.0001 - - - -
0.8397 144000 0.0001 - - - -
0.8456 145000 0.0001 0.0001 -0.0066 0.6352 -
0.8514 146000 0.0001 - - - -
0.8572 147000 0.0001 - - - -
0.8630 148000 0.0001 - - - -
0.8689 149000 0.0001 - - - -
0.8747 150000 0.0001 0.0001 -0.0065 0.6357 -
0.8805 151000 0.0001 - - - -
0.8864 152000 0.0001 - - - -
0.8922 153000 0.0001 - - - -
0.8980 154000 0.0001 - - - -
0.9039 155000 0.0001 0.0001 -0.0065 0.6336 -
0.9097 156000 0.0001 - - - -
0.9155 157000 0.0001 - - - -
0.9214 158000 0.0001 - - - -
0.9272 159000 0.0001 - - - -
0.9330 160000 0.0001 0.0001 -0.0064 0.6334 -
0.9389 161000 0.0001 - - - -
0.9447 162000 0.0001 - - - -
0.9505 163000 0.0001 - - - -
0.9563 164000 0.0001 - - - -
0.9622 165000 0.0001 0.0001 -0.0064 0.6337 -
0.9680 166000 0.0001 - - - -
0.9738 167000 0.0001 - - - -
0.9797 168000 0.0001 - - - -
0.9855 169000 0.0001 - - - -
0.9913 170000 0.0001 0.0001 -0.0063 0.6347 -
0.9972 171000 0.0001 - - - -
1.0 171486 - - - - 0.5986
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}
Downloads last month
6
Safetensors
Model size
78.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for whitemouse84/LaBSE-en-ru-distilled-each-third-layer

Finetuned
(4)
this model

Evaluation results