SentenceTransformer based on manuel-couto-pintos/roberta_erisk

This is a sentence-transformers model finetuned from manuel-couto-pintos/roberta_erisk. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: manuel-couto-pintos/roberta_erisk
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("manuel-couto-pintos/roberta_erisk_sts")
# Run inference
sentences = [
    'Which is the best affiliate program?',
    'What are the best affiliate programs?',
    'What are the best affiliate networks in the UK?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 50,881 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 6 tokens
    • mean: 13.77 tokens
    • max: 42 tokens
    • min: 6 tokens
    • mean: 13.82 tokens
    • max: 57 tokens
    • min: 6 tokens
    • mean: 14.96 tokens
    • max: 59 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    What is a good definition of Quora? What is the best definition of Quora? What is Quora address?
    How can I make myself appear offline on facebook? How do you make sure to appear as offline on Facebook? How can I get Facebook to remember to keep chat offline?
    How do I gain some healthy weight? What is the best way for underweight to gain weight? My boyfriend doesn't eat a lot. What are some ways to help him gain weight fast? He's 5'7 120lbs
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0983 500 4.3807
0.1965 1000 2.5872
0.2948 1500 1.7484
0.3930 2000 1.2649
0.4913 2500 1.0219
0.5895 3000 0.8703
0.6878 3500 0.771
0.7860 4000 0.655
0.8843 4500 0.6547
0.9825 5000 0.5772
1.0808 5500 0.5628
1.1790 6000 0.5163
1.2773 6500 0.4871
1.3755 7000 0.4842
1.4738 7500 0.4316
1.5720 8000 0.4199
1.6703 8500 0.3554
1.7685 9000 0.3467
1.8668 9500 0.3591
1.9650 10000 0.3356
2.0633 10500 0.3281
2.1615 11000 0.3149
2.2598 11500 0.2767
2.3580 12000 0.2849
2.4563 12500 0.244
2.5545 13000 0.2416
2.6528 13500 0.2008
2.7510 14000 0.1718
2.8493 14500 0.188
2.9475 15000 0.1656
3.0458 15500 0.1522
3.1440 16000 0.144
3.2423 16500 0.1329
3.3405 17000 0.1431
3.4388 17500 0.128
3.5370 18000 0.1251
3.6353 18500 0.0921
3.7335 19000 0.0882
3.8318 19500 0.1087
3.9300 20000 0.0819
4.0283 20500 0.0916
4.1265 21000 0.0837
4.2248 21500 0.0855
4.3230 22000 0.0727
4.4213 22500 0.0772
4.5196 23000 0.0676
4.6178 23500 0.0597
4.7161 24000 0.0555
4.8143 24500 0.0613
4.9126 25000 0.0589
5.0108 25500 0.0503
5.1091 26000 0.0546
5.2073 26500 0.0446
5.3056 27000 0.0591
5.4038 27500 0.0431
5.5021 28000 0.0402
5.6003 28500 0.0354
5.6986 29000 0.0405
5.7968 29500 0.0308
5.8951 30000 0.0363
5.9933 30500 0.0365
6.0916 31000 0.0333
6.1898 31500 0.0238
6.2881 32000 0.0372
6.3863 32500 0.0331
6.4846 33000 0.0253
6.5828 33500 0.0315
6.6811 34000 0.0193
6.7793 34500 0.0239
6.8776 35000 0.0201
6.9758 35500 0.0213
7.0741 36000 0.0187
7.1723 36500 0.0125
7.2706 37000 0.0151
7.3688 37500 0.0208
7.4671 38000 0.0101
7.5653 38500 0.0191
7.6636 39000 0.0125
7.7618 39500 0.0136
7.8601 40000 0.0135
7.9583 40500 0.0118
8.0566 41000 0.012
8.1548 41500 0.0079
8.2531 42000 0.0105
8.3513 42500 0.0094
8.4496 43000 0.0079
8.5478 43500 0.0118
8.6461 44000 0.0105
8.7444 44500 0.0058
8.8426 45000 0.013
8.9409 45500 0.0065
9.0391 46000 0.0089
9.1374 46500 0.0031
9.2356 47000 0.008
9.3339 47500 0.0065
9.4321 48000 0.0052
9.5304 48500 0.0066
9.6286 49000 0.0039
9.7269 49500 0.004
9.8251 50000 0.0051
9.9234 50500 0.003

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.0.1+cu117
  • Accelerate: 0.32.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
4
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for manuel-couto-pintos/roberta_erisk_sts

Finetuned
(2)
this model