SentenceTransformer based on dbourget/pb-ds1-48K

This is a sentence-transformers model finetuned from dbourget/pb-ds1-48K. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dbourget/pb-ds1-48K
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/pb-ds1-48K-philsim")
# Run inference
sentences = [
    'This essay explores the historical and modern perspectives on the Gettier problem, highlighting the connections between this issue, skepticism, and relevance. Through methods such as historical analysis, induction, and deduction, it is found that while contextual theories and varying definitions of knowledge do not fully address skeptical challenges, they can help clarify our understanding of knowledge. Ultimately, embracing subjectivity and intuition can provide insight into what it truly means to claim knowledge.',
    'Objective: In this essay,  I will try to track some historical and modern stages of the discussion on the Gettier problem, and point out the interrelations of the questions that this problem raises for epistemologists, with sceptical arguments, and a so-called problem of relevance. Methods: historical analysis, induction, generalization, deduction, discourse, intuition results: Albeit the contextual theories of knowledge, the use of different definitions of knowledge, and the different ways of the uses of knowledge do not resolve all the issues that the sceptic can put forward, but they can be productive in giving clarity to a concept of knowledge for us. On the other hand, our knowledge will always have an element of intuition and subjectivity, however not equating to epistemic luck and probability.  Significance novelty: the approach to the context in general, not giving up being a Subject may give us a clarity about the sense of what it means to say – “I know”.',
    "Teaching competency in bioethics has been a concern since the field's inception. The first report on the teaching of contemporary bioethics was published in 1976 by The Hastings Center, which concluded that graduate programs were not necessary at the time. However, the report speculated that future developments may require new academic structures for graduate education in bioethics. The creation of a terminal degree in bioethics has its critics, with scholars debating whether bioethics is a discipline with its own methods and theoretical grounding, a multidisciplinary field, or something else entirely. Despite these debates, new bioethics training programs have emerged at all postsecondary levels in the U.S. This essay examines the number and types of programs and degrees in this growing field.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9378
spearman_cosine 0.8943
pearson_manhattan 0.971
spearman_manhattan 0.8969
pearson_euclidean 0.9711
spearman_euclidean 0.8966
pearson_dot 0.942
spearman_dot 0.8551
pearson_max 0.9711
spearman_max 0.8969

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 190
  • per_device_eval_batch_size: 190
  • learning_rate: 5e-06
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 190
  • per_device_eval_batch_size: 190
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss sts-dev_spearman_cosine
0 0 - - 0.8229
0.0178 10 0.0545 - -
0.0355 20 0.0556 - -
0.0533 30 0.0502 - -
0.0710 40 0.0497 - -
0.0888 50 0.0413 - -
0.1066 60 0.0334 - -
0.1243 70 0.0238 - -
0.1421 80 0.0206 - -
0.1599 90 0.0167 - -
0.1776 100 0.0146 0.0725 0.8788
0.1954 110 0.0127 - -
0.2131 120 0.0125 - -
0.2309 130 0.0115 - -
0.2487 140 0.0116 - -
0.2664 150 0.0111 - -
0.2842 160 0.0107 - -
0.3020 170 0.0113 - -
0.3197 180 0.0106 - -
0.3375 190 0.0099 - -
0.3552 200 0.0092 0.0207 0.8856
0.3730 210 0.0097 - -
0.3908 220 0.0099 - -
0.4085 230 0.0087 - -
0.4263 240 0.0087 - -
0.4440 250 0.0082 - -
0.4618 260 0.0083 - -
0.4796 270 0.0089 - -
0.4973 280 0.0082 - -
0.5151 290 0.0078 - -
0.5329 300 0.0081 0.0078 0.8891
0.5506 310 0.0081 - -
0.5684 320 0.0072 - -
0.5861 330 0.0084 - -
0.6039 340 0.0083 - -
0.6217 350 0.0078 - -
0.6394 360 0.0077 - -
0.6572 370 0.008 - -
0.6750 380 0.0073 - -
0.6927 390 0.008 - -
0.7105 400 0.0073 0.0058 0.8890
0.7282 410 0.0075 - -
0.7460 420 0.0077 - -
0.7638 430 0.0074 - -
0.7815 440 0.0073 - -
0.7993 450 0.007 - -
0.8171 460 0.0043 - -
0.8348 470 0.0052 - -
0.8526 480 0.0046 - -
0.8703 490 0.0073 - -
0.8881 500 0.0056 0.0069 0.8922
0.9059 510 0.0059 - -
0.9236 520 0.0045 - -
0.9414 530 0.0033 - -
0.9591 540 0.0058 - -
0.9769 550 0.0056 - -
0.9947 560 0.0046 - -
1.0124 570 0.003 - -
1.0302 580 0.0039 - -
1.0480 590 0.0032 - -
1.0657 600 0.0031 0.0029 0.8931
1.0835 610 0.0046 - -
1.1012 620 0.003 - -
1.1190 630 0.0021 - -
1.1368 640 0.0031 - -
1.1545 650 0.0035 - -
1.1723 660 0.0033 - -
1.1901 670 0.0024 - -
1.2078 680 0.0012 - -
1.2256 690 0.0075 - -
1.2433 700 0.0028 0.0036 0.8945
1.2611 710 0.0033 - -
1.2789 720 0.0023 - -
1.2966 730 0.0034 - -
1.3144 740 0.0018 - -
1.3321 750 0.0016 - -
1.3499 760 0.0025 - -
1.3677 770 0.002 - -
1.3854 780 0.0016 - -
1.4032 790 0.0018 - -
1.4210 800 0.003 0.0027 0.8944
1.4387 810 0.0018 - -
1.4565 820 0.0008 - -
1.4742 830 0.0014 - -
1.4920 840 0.0025 - -
1.5098 850 0.0026 - -
1.5275 860 0.0012 - -
1.5453 870 0.001 - -
1.5631 880 0.001 - -
1.5808 890 0.0012 - -
1.5986 900 0.0021 0.0021 0.8952
1.6163 910 0.0016 - -
1.6341 920 0.0008 - -
1.6519 930 0.0008 - -
1.6696 940 0.0009 - -
1.6874 950 0.0004 - -
1.7052 960 0.0003 - -
1.7229 970 0.0007 - -
1.7407 980 0.0007 - -
1.7584 990 0.0011 - -
1.7762 1000 0.0007 0.0029 0.8952
1.7940 1010 0.0008 - -
1.8117 1020 0.001 - -
1.8295 1030 0.0006 - -
1.8472 1040 0.0006 - -
1.8650 1050 0.0015 - -
1.8828 1060 0.0009 - -
1.9005 1070 0.0005 - -
1.9183 1080 0.0006 - -
1.9361 1090 0.0021 - -
1.9538 1100 0.0009 0.0023 0.8943
1.9716 1110 0.0007 - -
1.9893 1120 0.0003 - -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
21
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dbourget/pb-ds1-48K-philsim

Finetuned
(1)
this model

Evaluation results