SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml")
# Run inference
sentences = [
    'Mathlib.AlgebraicGeometry.Noetherian#22',
    'AlgebraicGeometry.of_affine_open_cover',
    'pow_lt_pow_right_of_lt_one₀',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,232,571 training samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 11 tokens
    • mean: 16.91 tokens
    • max: 28 tokens
    • min: 3 tokens
    • mean: 10.27 tokens
    • max: 27 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Group.Subgroup.Pointwise#27 Set.mul_subgroupClosure
    Mathlib.Algebra.Group.Subgroup.Pointwise#27 pow_succ
    Mathlib.Algebra.Group.Subgroup.Pointwise#27 mul_assoc
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,648 evaluation samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 12 tokens
    • mean: 17.34 tokens
    • max: 26 tokens
    • min: 3 tokens
    • mean: 10.9 tokens
    • max: 34 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.BigOperators.Associated#0 Prime.dvd_or_dvd
    Mathlib.Algebra.BigOperators.Associated#0 Multiset.induction_on
    Mathlib.Algebra.BigOperators.Associated#0 Multiset.mem_cons_of_mem
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0002
  • num_train_epochs: 1.0
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • bf16: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0024 10 6.4577 -
0.0048 20 6.011 -
0.0073 30 5.6038 -
0.0097 40 5.3306 -
0.0102 42 - 1.8049
0.0121 50 5.139 -
0.0145 60 5.0408 -
0.0169 70 4.9269 -
0.0194 80 4.8676 -
0.0203 84 - 1.6211
0.0218 90 4.7792 -
0.0242 100 4.7427 -
0.0266 110 4.6929 -
0.0290 120 4.6701 -
0.0305 126 - 1.4521
0.0314 130 4.5866 -
0.0339 140 4.5066 -
0.0363 150 4.5189 -
0.0387 160 4.4494 -
0.0406 168 - 1.4517
0.0411 170 4.4117 -
0.0435 180 4.3827 -
0.0460 190 4.2533 -
0.0484 200 4.2634 -
0.0508 210 4.2472 1.3644
0.0532 220 4.1949 -
0.0556 230 4.1769 -
0.0581 240 4.1372 -
0.0605 250 4.0943 -
0.0610 252 - 1.3161
0.0629 260 4.1049 -
0.0653 270 4.1018 -
0.0677 280 4.078 -
0.0701 290 4.0355 -
0.0711 294 - 1.2026
0.0726 300 4.0104 -
0.0750 310 3.9392 -
0.0774 320 3.9519 -
0.0798 330 3.9671 -
0.0813 336 - 1.1869
0.0822 340 3.9297 -
0.0847 350 3.9435 -
0.0871 360 3.9317 -
0.0895 370 3.8544 -
0.0914 378 - 1.1943
0.0919 380 3.9131 -
0.0943 390 3.8758 -
0.0968 400 3.7628 -
0.0992 410 3.8589 -
0.1016 420 3.8057 1.1280
0.1040 430 3.7792 -
0.1064 440 3.8011 -
0.1089 450 3.7708 -
0.1113 460 3.7248 -
0.1118 462 - 1.1578
0.1137 470 3.6717 -
0.1161 480 3.643 -
0.1185 490 3.6564 -
0.1209 500 3.6266 -
0.1219 504 - 1.1440
0.1234 510 3.6275 -
0.1258 520 3.6675 -
0.1282 530 3.6608 -
0.1306 540 3.6002 -
0.1321 546 - 1.1416
0.1330 550 3.6128 -
0.1355 560 3.6028 -
0.1379 570 3.5061 -
0.1403 580 3.5551 -
0.1422 588 - 1.0684
0.1427 590 3.5213 -
0.1451 600 3.495 -
0.1476 610 3.5169 -
0.1500 620 3.4666 -
0.1524 630 3.4942 1.0657
0.1548 640 3.4864 -
0.1572 650 3.4139 -
0.1597 660 3.3886 -
0.1621 670 3.3498 -
0.1626 672 - 1.0647
0.1645 680 3.3646 -
0.1669 690 3.3792 -
0.1693 700 3.3803 -
0.1717 710 3.3244 -
0.1727 714 - 1.0366
0.1742 720 3.3935 -
0.1766 730 3.4148 -
0.1790 740 3.3258 -
0.1814 750 3.3057 -
0.1829 756 - 0.9969
0.1838 760 3.3044 -
0.1863 770 3.3046 -
0.1887 780 3.2663 -
0.1911 790 3.2622 -
0.1930 798 - 0.9886
0.1935 800 3.3027 -
0.1959 810 3.3228 -
0.1984 820 3.2329 -
0.2008 830 3.2792 -
0.2032 840 3.2124 0.9268
0.2056 850 3.1746 -
0.2080 860 3.1745 -
0.2104 870 3.1741 -
0.2129 880 3.242 -
0.2134 882 - 0.9676
0.2153 890 3.2074 -
0.2177 900 3.0812 -
0.2201 910 3.1686 -
0.2225 920 3.1844 -
0.2235 924 - 0.9905
0.2250 930 3.1659 -
0.2274 940 3.0974 -
0.2298 950 3.1673 -
0.2322 960 3.1398 -
0.2337 966 - 0.9434
0.2346 970 3.1269 -
0.2371 980 3.0904 -
0.2395 990 3.0663 -
0.2419 1000 3.0815 -
0.2438 1008 - 0.9529
0.2443 1010 2.9928 -
0.2467 1020 3.0058 -
0.2492 1030 3.0084 -
0.2516 1040 3.0597 -
0.2540 1050 3.0111 0.9823
0.2564 1060 2.9955 -
0.2588 1070 2.9575 -
0.2612 1080 2.9818 -
0.2637 1090 3.0291 -
0.2642 1092 - 0.9308
0.2661 1100 3.0057 -
0.2685 1110 2.9912 -
0.2709 1120 2.9504 -
0.2733 1130 2.971 -
0.2743 1134 - 0.9150
0.2758 1140 2.9252 -
0.2782 1150 2.9444 -
0.2806 1160 2.9667 -
0.2830 1170 2.9109 -
0.2845 1176 - 0.9648
0.2854 1180 2.8874 -
0.2879 1190 2.9271 -
0.2903 1200 2.8456 -
0.2927 1210 2.8096 -
0.2946 1218 - 0.9288
0.2951 1220 2.8143 -
0.2975 1230 2.8275 -
0.3000 1240 2.7645 -
0.3024 1250 2.8012 -
0.3048 1260 2.8237 0.9021
0.3072 1270 2.8388 -
0.3096 1280 2.8354 -
0.3120 1290 2.8441 -
0.3145 1300 2.7928 -
0.3149 1302 - 0.8679
0.3169 1310 2.7765 -
0.3193 1320 2.7912 -
0.3217 1330 2.8062 -
0.3241 1340 2.8296 -
0.3251 1344 - 0.8739
0.3266 1350 2.7594 -
0.3290 1360 2.7772 -
0.3314 1370 2.7557 -
0.3338 1380 2.7978 -
0.3353 1386 - 0.8085
0.3362 1390 2.7711 -
0.3387 1400 2.7239 -
0.3411 1410 2.7382 -
0.3435 1420 2.7235 -
0.3454 1428 - 0.8075
0.3459 1430 2.7126 -
0.3483 1440 2.7319 -
0.3507 1450 2.7015 -
0.3532 1460 2.7161 -
0.3556 1470 2.6951 0.7942
0.3580 1480 2.6832 -
0.3604 1490 2.7305 -
0.3628 1500 2.6417 -
0.3653 1510 2.6772 -
0.3657 1512 - 0.8244
0.3677 1520 2.6933 -
0.3701 1530 2.6397 -
0.3725 1540 2.6323 -
0.3749 1550 2.6216 -
0.3759 1554 - 0.8660
0.3774 1560 2.6384 -
0.3798 1570 2.669 -
0.3822 1580 2.6828 -
0.3846 1590 2.6789 -
0.3861 1596 - 0.8344
0.3870 1600 2.6774 -
0.3895 1610 2.6501 -
0.3919 1620 2.63 -
0.3943 1630 2.6474 -
0.3962 1638 - 0.7953
0.3967 1640 2.6595 -
0.3991 1650 2.7007 -
0.4015 1660 2.639 -
0.4040 1670 2.6418 -
0.4064 1680 2.6044 0.7789
0.4088 1690 2.6058 -
0.4112 1700 2.564 -
0.4136 1710 2.5331 -
0.4161 1720 2.5746 -
0.4165 1722 - 0.8096
0.4185 1730 2.5725 -
0.4209 1740 2.5796 -
0.4233 1750 2.5675 -
0.4257 1760 2.558 -
0.4267 1764 - 0.7845
0.4282 1770 2.5968 -
0.4306 1780 2.5798 -
0.4330 1790 2.4829 -
0.4354 1800 2.4951 -
0.4369 1806 - 0.7755
0.4378 1810 2.519 -
0.4403 1820 2.4864 -
0.4427 1830 2.5012 -
0.4451 1840 2.5165 -
0.4470 1848 - 0.7455
0.4475 1850 2.5074 -
0.4499 1860 2.4461 -
0.4523 1870 2.452 -
0.4548 1880 2.5045 -
0.4572 1890 2.4821 0.7466
0.4596 1900 2.5006 -
0.4620 1910 2.4616 -
0.4644 1920 2.4638 -
0.4669 1930 2.4698 -
0.4673 1932 - 0.7377
0.4693 1940 2.5035 -
0.4717 1950 2.4711 -
0.4741 1960 2.5317 -
0.4765 1970 2.472 -
0.4775 1974 - 0.7255
0.4790 1980 2.438 -
0.4814 1990 2.432 -
0.4838 2000 2.3946 -
0.4862 2010 2.3805 -
0.4877 2016 - 0.7449
0.4886 2020 2.4001 -
0.4910 2030 2.418 -
0.4935 2040 2.3911 -
0.4959 2050 2.4212 -
0.4978 2058 - 0.7663
0.4983 2060 2.3855 -
0.5007 2070 2.3713 -
0.5031 2080 2.4021 -
0.5056 2090 2.3537 -
0.5080 2100 2.4182 0.7588
0.5104 2110 2.413 -
0.5128 2120 2.3741 -
0.5152 2130 2.4061 -
0.5177 2140 2.4137 -
0.5181 2142 - 0.7185
0.5201 2150 2.3823 -
0.5225 2160 2.3781 -
0.5249 2170 2.3621 -
0.5273 2180 2.3601 -
0.5283 2184 - 0.7088
0.5298 2190 2.4113 -
0.5322 2200 2.2813 -
0.5346 2210 2.3359 -
0.5370 2220 2.3571 -
0.5385 2226 - 0.7379
0.5394 2230 2.3492 -
0.5418 2240 2.366 -
0.5443 2250 2.3369 -
0.5467 2260 2.2976 -
0.5486 2268 - 0.7122
0.5491 2270 2.322 -
0.5515 2280 2.3378 -
0.5539 2290 2.3309 -
0.5564 2300 2.3335 -
0.5588 2310 2.3072 0.7062
0.5612 2320 2.3204 -
0.5636 2330 2.3422 -
0.5660 2340 2.3745 -
0.5685 2350 2.357 -
0.5689 2352 - 0.6977
0.5709 2360 2.3391 -
0.5733 2370 2.2945 -
0.5757 2380 2.2974 -
0.5781 2390 2.2967 -
0.5791 2394 - 0.6999
0.5806 2400 2.3177 -
0.5830 2410 2.3384 -
0.5854 2420 2.2601 -
0.5878 2430 2.2544 -
0.5893 2436 - 0.6774
0.5902 2440 2.2491 -
0.5926 2450 2.2732 -
0.5951 2460 2.2231 -
0.5975 2470 2.2812 -
0.5994 2478 - 0.6634
0.5999 2480 2.2717 -
0.6023 2490 2.2238 -
0.6047 2500 2.2699 -
0.6072 2510 2.2256 -
0.6096 2520 2.2547 0.6635
0.6120 2530 2.224 -
0.6144 2540 2.2645 -
0.6168 2550 2.2098 -
0.6193 2560 2.1807 -
0.6197 2562 - 0.6813
0.6217 2570 2.2292 -
0.6241 2580 2.1626 -
0.6265 2590 2.17 -
0.6289 2600 2.1772 -
0.6299 2604 - 0.6646
0.6313 2610 2.2138 -
0.6338 2620 2.2005 -
0.6362 2630 2.1698 -
0.6386 2640 2.1521 -
0.6401 2646 - 0.6704
0.6410 2650 2.2262 -
0.6434 2660 2.2312 -
0.6459 2670 2.187 -
0.6483 2680 2.1775 -
0.6502 2688 - 0.6599
0.6507 2690 2.1486 -
0.6531 2700 2.175 -
0.6555 2710 2.187 -
0.6580 2720 2.1859 -
0.6604 2730 2.1693 0.6518
0.6628 2740 2.1661 -
0.6652 2750 2.1916 -
0.6676 2760 2.1953 -
0.6701 2770 2.1674 -
0.6705 2772 - 0.6670
0.6725 2780 2.1716 -
0.6749 2790 2.189 -
0.6773 2800 2.1499 -
0.6797 2810 2.198 -
0.6807 2814 - 0.6443
0.6821 2820 2.1888 -
0.6846 2830 2.182 -
0.6870 2840 2.1553 -
0.6894 2850 2.1383 -
0.6909 2856 - 0.6478
0.6918 2860 2.1612 -
0.6942 2870 2.1143 -
0.6967 2880 2.1486 -
0.6991 2890 2.1399 -
0.7010 2898 - 0.6526
0.7015 2900 2.1102 -
0.7039 2910 2.1406 -
0.7063 2920 2.1497 -
0.7088 2930 2.1516 -
0.7112 2940 2.157 0.6488
0.7136 2950 2.1253 -
0.7160 2960 2.1263 -
0.7184 2970 2.1494 -
0.7209 2980 2.1852 -
0.7213 2982 - 0.6403
0.7233 2990 2.1337 -
0.7257 3000 2.0886 -
0.7281 3010 2.1446 -
0.7305 3020 2.1968 -
0.7315 3024 - 0.6295
0.7329 3030 2.1591 -
0.7354 3040 2.2047 -
0.7378 3050 2.1976 -
0.7402 3060 2.1879 -
0.7417 3066 - 0.6194
0.7426 3070 2.1718 -
0.7450 3080 2.1308 -
0.7475 3090 2.1689 -
0.7499 3100 2.1403 -
0.7518 3108 - 0.6232
0.7523 3110 2.1289 -
0.7547 3120 2.1357 -
0.7571 3130 2.0794 -
0.7596 3140 2.0682 -
0.7620 3150 2.0474 0.6240
0.7644 3160 2.0671 -
0.7668 3170 2.102 -
0.7692 3180 2.1298 -
0.7716 3190 2.1423 -
0.7721 3192 - 0.6201
0.7741 3200 2.1402 -
0.7765 3210 2.0642 -
0.7789 3220 2.1015 -
0.7813 3230 2.0943 -
0.7823 3234 - 0.6179
0.7837 3240 2.0712 -
0.7862 3250 2.0815 -
0.7886 3260 2.1121 -
0.7910 3270 2.0644 -
0.7925 3276 - 0.6156
0.7934 3280 2.0557 -
0.7958 3290 2.1012 -
0.7983 3300 2.052 -
0.8007 3310 2.0757 -
0.8026 3318 - 0.6016
0.8031 3320 2.0778 -
0.8055 3330 2.0894 -
0.8079 3340 2.0869 -
0.8104 3350 2.02 -
0.8128 3360 2.0559 0.6053
0.8152 3370 2.0366 -
0.8176 3380 2.04 -
0.8200 3390 2.1044 -
0.8224 3400 2.0686 -
0.8229 3402 - 0.6000
0.8249 3410 2.0828 -
0.8273 3420 2.0871 -
0.8297 3430 2.0887 -
0.8321 3440 2.1046 -
0.8331 3444 - 0.6045
0.8345 3450 2.0854 -
0.8370 3460 2.0727 -
0.8394 3470 2.0631 -
0.8418 3480 1.9793 -
0.8433 3486 - 0.5937
0.8442 3490 2.0554 -
0.8466 3500 2.0813 -
0.8491 3510 2.0382 -
0.8515 3520 2.0452 -
0.8534 3528 - 0.5968
0.8539 3530 2.0577 -
0.8563 3540 2.036 -
0.8587 3550 2.0794 -
0.8612 3560 2.0635 -
0.8636 3570 2.0277 0.5926
0.8660 3580 2.0952 -
0.8684 3590 2.0965 -
0.8708 3600 2.029 -
0.8732 3610 2.061 -
0.8737 3612 - 0.5937
0.8757 3620 1.9961 -
0.8781 3630 1.6592 -
0.8805 3640 1.506 -
0.8829 3650 1.6058 -
0.8839 3654 - 0.5780
0.8853 3660 1.7033 -
0.8878 3670 1.8416 -
0.8902 3680 1.9193 -
0.8926 3690 2.0024 -
0.8940 3696 - 0.6375
0.8950 3700 1.9548 -
0.8974 3710 1.9862 -
0.8999 3720 2.0547 -
0.9023 3730 2.0142 -
0.9042 3738 - 0.6825
0.9047 3740 1.992 -
0.9071 3750 1.9453 -
0.9095 3760 1.9988 -
0.9119 3770 1.9175 -
0.9144 3780 1.964 0.7054
0.9168 3790 2.0087 -
0.9192 3800 2.0223 -
0.9216 3810 1.9337 -
0.9240 3820 1.9478 -
0.9245 3822 - 0.7357
0.9265 3830 1.9026 -
0.9289 3840 2.0058 -
0.9313 3850 1.9698 -
0.9337 3860 1.9783 -
0.9347 3864 - 0.7518
0.9361 3870 2.0335 -
0.9386 3880 1.9112 -
0.9410 3890 1.9733 -
0.9434 3900 1.9693 -
0.9448 3906 - 0.7665
0.9458 3910 1.9911 -
0.9482 3920 1.8972 -
0.9507 3930 1.9521 -
0.9531 3940 1.9827 -
0.9550 3948 - 0.7700
0.9555 3950 2.0008 -
0.9579 3960 1.9525 -
0.9603 3970 2.0095 -
0.9627 3980 2.018 -
0.9652 3990 1.9514 0.7782
0.9676 4000 1.878 -
0.9700 4010 1.9244 -
0.9724 4020 1.9141 -
0.9748 4030 1.8425 -
0.9753 4032 - 0.7829
0.9773 4040 1.899 -
0.9797 4050 2.0281 -
0.9821 4060 1.9944 -
0.9845 4070 2.0086 -
0.9855 4074 - 0.7848
0.9869 4080 1.8952 -
0.9894 4090 1.9491 -
0.9918 4100 1.9953 -
0.9942 4110 1.9592 -
0.9956 4116 - 0.7852
0.9966 4120 1.8991 -
0.9990 4130 1.9578 -

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
10
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml

Finetuned
(10)
this model