SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hcy5561/distilroberta-base-sentence-transformer-triplets")
# Run inference
sentences = [
    'How did Halloween Originate? What country did it originate on?',
    'In what country did Halloween originate?',
    'What was Halloween like in the 1990s?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9878
dot_accuracy 0.0124
manhattan_accuracy 0.9874
euclidean_accuracy 0.9878
max_accuracy 0.9878

Training Details

Training Dataset

Unnamed Dataset

  • Size: 91,585 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 13.95 tokens
    • max: 50 tokens
    • min: 6 tokens
    • mean: 14.02 tokens
    • max: 52 tokens
    • min: 6 tokens
    • mean: 14.68 tokens
    • max: 60 tokens
  • Samples:
    anchor positive negative
    How can I overcome a bad mood? How do I break out of a bad mood? The world around me seems so austere and gloomy because of my mood. It's depressing me considerably. What can I do?
    What are symptoms of mild schizophrenia? What are some symptoms of when you become schizophrenic? Is confusion another symptom of being schizophrenic?
    What are some ideas which transformed ordinary people into millionaires? What are some things ordinary people know but millionaires don't? What can billionaires do that millionaire cannot do?
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,088 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.14 tokens
    • max: 44 tokens
    • min: 6 tokens
    • mean: 13.96 tokens
    • max: 49 tokens
    • min: 6 tokens
    • mean: 14.8 tokens
    • max: 60 tokens
  • Samples:
    anchor positive negative
    Why do I see the exact same questions in my feed all the time? Why are too many questions repeating in my feed sometimes? Why does this "question" keep showing up in the Unorganized Questions global_feed? (see description for screenshot)
    Can we expect time travel to become a reality? Can we time travel anyhow? What do you hAve to say about time travel (I am not science student but I read it on net and its so exciting topic but still no clear idea that is it possible or it's just a rumour)?
    Is it too late to start medical school at 32? Is it too late to go to medical school at 24? As a 14 year old girl who wants to go to medical school, should I work extremely hard and study a lot now to be ready for it? What should I do?
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss QQP-nli-dev_max_accuracy
0 0 - - 0.8783
0.1746 500 2.3079 0.8664 0.9581
0.3493 1000 0.9367 0.5027 0.9737
0.5239 1500 0.6747 0.4471 0.9743
0.6986 2000 0.5323 0.3740 0.9776
0.8732 2500 0.4765 0.3178 0.9825
1.0479 3000 0.4104 0.2809 0.9866
1.2225 3500 0.3266 0.2633 0.9870
1.3971 4000 0.2129 0.2566 0.9862
1.5718 4500 0.1559 0.2542 0.9858
1.7464 5000 0.1432 0.2482 0.9853
1.9211 5500 0.1361 0.2370 0.9845
2.0957 6000 0.1179 0.2102 0.9880
2.2703 6500 0.0921 0.2201 0.9870
2.4450 7000 0.0656 0.2075 0.9878
2.6196 7500 0.0497 0.2011 0.9876
2.7943 8000 0.0455 0.1960 0.9878
2.9689 8500 0.0422 0.1973 0.9872
3.1436 9000 0.0349 0.1863 0.9890
3.3182 9500 0.0319 0.1850 0.9882
3.4928 10000 0.02 0.1854 0.9882
3.6675 10500 0.0184 0.1849 0.9884
3.8421 11000 0.0178 0.1828 0.9878

Framework Versions

  • Python: 3.10.6
  • Sentence Transformers: 3.0.1
  • Transformers: 4.39.3
  • PyTorch: 2.2.2+cu118
  • Accelerate: 0.28.0
  • Datasets: 2.20.0
  • Tokenizers: 0.15.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
3
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hcy5561/distilroberta-base-sentence-transformer-triplets

Finetuned
(2409)
this model

Evaluation results