SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LeoChiuu/all-MiniLM-L6-v2-negations")
# Run inference
sentences = [
    'He published a history of Cornwall, New York in 1873.',
    'He failed to publish a history of Cornwall, New York in 1873.',
    "Salafis assert that reliance on taqlid has led to Islam 's decline.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 77,376 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.2 tokens
    • max: 57 tokens
    • min: 5 tokens
    • mean: 16.32 tokens
    • max: 56 tokens
    • 0: ~53.20%
    • 1: ~46.80%
  • Samples:
    sentence_0 sentence_1 label
    The situation in Yemen was already much better than it was in Bahrain. The situation in Yemen was not much better than Bahrain. 0
    She was a member of the Gamma Theta Upsilon honour society of geography. She was denied membership of the Gamma Theta Upsilon honour society of mathematics. 0
    Which aren't small and not worth the price. Which are small and not worth the price. 0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1034 500 0.3382
0.2068 1000 0.2112
0.3102 1500 0.1649
0.4136 2000 0.1454
0.5170 2500 0.1244
0.6203 3000 0.1081
0.7237 3500 0.0962
0.8271 4000 0.0924
0.9305 4500 0.0852
1.0339 5000 0.0812
1.1373 5500 0.0833
1.2407 6000 0.0736
1.3441 6500 0.0756
1.4475 7000 0.0665
1.5509 7500 0.0661
1.6543 8000 0.0625
1.7577 8500 0.0621
1.8610 9000 0.0593
1.9644 9500 0.054
2.0678 10000 0.0569
2.1712 10500 0.0566
2.2746 11000 0.0502
2.3780 11500 0.0516
2.4814 12000 0.0455
2.5848 12500 0.0454
2.6882 13000 0.0424
2.7916 13500 0.044
2.8950 14000 0.0376
2.9983 14500 0.0386
3.1017 15000 0.0392
3.2051 15500 0.0344
3.3085 16000 0.0348
3.4119 16500 0.0343
3.5153 17000 0.0322
3.6187 17500 0.0324
3.7221 18000 0.0278
3.8255 18500 0.0294
3.9289 19000 0.0292
4.0323 19500 0.0276
4.1356 20000 0.0285
4.2390 20500 0.026
4.3424 21000 0.0271
4.4458 21500 0.0248
4.5492 22000 0.0245
4.6526 22500 0.0253
4.7560 23000 0.022
4.8594 23500 0.0219
4.9628 24000 0.0207
5.0662 24500 0.0212
5.1696 25000 0.0218
5.2730 25500 0.0192
5.3763 26000 0.0198
5.4797 26500 0.0183
5.5831 27000 0.02
5.6865 27500 0.0176
5.7899 28000 0.0184
5.8933 28500 0.0157
5.9967 29000 0.0175
6.1001 29500 0.0175
6.2035 30000 0.0163
6.3069 30500 0.0173
6.4103 31000 0.0165
6.5136 31500 0.0152
6.6170 32000 0.0155
6.7204 32500 0.0132
6.8238 33000 0.0147
6.9272 33500 0.0145
7.0306 34000 0.014
7.1340 34500 0.0147
7.2374 35000 0.0126
7.3408 35500 0.0141
7.4442 36000 0.0127
7.5476 36500 0.0132
7.6510 37000 0.0125
7.7543 37500 0.0111
7.8577 38000 0.011
7.9611 38500 0.0125
8.0645 39000 0.0128
8.1679 39500 0.013
8.2713 40000 0.0115
8.3747 40500 0.0111
8.4781 41000 0.0108
8.5815 41500 0.012
8.6849 42000 0.0108
8.7883 42500 0.0105
8.8916 43000 0.0092
8.9950 43500 0.0115
9.0984 44000 0.0112
9.2018 44500 0.0096
9.3052 45000 0.0106
9.4086 45500 0.011
9.5120 46000 0.01
9.6154 46500 0.011
9.7188 47000 0.0097
9.8222 47500 0.0096
9.9256 48000 0.0102

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.0.1
  • Transformers: 4.40.2
  • PyTorch: 2.3.0+cpu
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
29
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for LeoChiuu/all-MiniLM-L6-v2-negations

Finetuned
(187)
this model