SentenceTransformer based on TaylorAI/bge-micro-v2

This is a sentence-transformers model finetuned from TaylorAI/bge-micro-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: TaylorAI/bge-micro-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("training")
# Run inference
sentences = [
    'Carbuncle, unspecified',
    'Cutaneous abscess, furuncle and carbuncle, unspecified',
    'Furuncle of neck',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 160,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 15.92 tokens
    • max: 47 tokens
    • min: 4 tokens
    • mean: 15.81 tokens
    • max: 41 tokens
    • min: 3 tokens
    • mean: 15.75 tokens
    • max: 45 tokens
  • Samples:
    anchor positive negative
    Sudden visual loss, right eye Sudden visual loss Visual distortions of shape and size
    Drug/chem diab with mild nonp rtnop without mclr edema, unsp Drug or chemical Drug/chem diab with mod nonp rtnop with macular edema, bi Drug or Hypostatic pneumonia, unspecified organism
    Bronchiectasis with (acute) exacerbation Bronchiectasis Gestatnl htn w/o significant proteinuria, second trimester
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • max_steps: 10000

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 10000
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.005 50 3.9819
0.01 100 3.8181
0.015 150 3.7244
0.02 200 3.6362
0.025 250 3.5459
0.03 300 3.4653
0.035 350 3.4066
0.04 400 3.3441
0.045 450 3.3497
0.05 500 3.2625
0.055 550 3.1359
0.06 600 3.1542
0.065 650 3.1528
0.07 700 3.1634
0.075 750 3.0737
0.08 800 3.1022
0.085 850 3.0288
0.09 900 2.9434
0.095 950 2.9014
0.1 1000 3.0412
0.105 1050 2.9844
0.11 1100 2.845
0.115 1150 2.9053
0.12 1200 2.8447
0.125 1250 2.8222
0.13 1300 2.8545
0.135 1350 2.7114
0.14 1400 2.7586
0.145 1450 2.6997
0.15 1500 2.5484
0.155 1550 2.7853
0.16 1600 2.6711
0.165 1650 2.7364
0.17 1700 2.8237
0.175 1750 2.737
0.18 1800 2.7059
0.185 1850 2.6577
0.19 1900 2.777
0.195 1950 2.7369
0.2 2000 2.6317
0.205 2050 2.6678
0.21 2100 2.6889
0.215 2150 2.5734
0.22 2200 2.7214
0.225 2250 2.5059
0.23 2300 2.623
0.235 2350 2.6761
0.24 2400 2.5663
0.245 2450 2.6678
0.25 2500 2.5856
0.255 2550 2.5436
0.26 2600 2.6359
0.265 2650 2.6266
0.27 2700 2.5698
0.275 2750 2.5611
0.28 2800 2.6306
0.285 2850 2.658
0.29 2900 2.5878
0.295 2950 2.553
0.3 3000 2.5295
0.305 3050 2.5211
0.31 3100 2.6489
0.315 3150 2.6131
0.32 3200 2.7298
0.325 3250 2.5931
0.33 3300 2.5927
0.335 3350 2.5403
0.34 3400 2.4497
0.345 3450 2.6764
0.35 3500 2.5673
0.355 3550 2.6134
0.36 3600 2.6298
0.365 3650 2.5747
0.37 3700 2.6245
0.375 3750 2.5275
0.38 3800 2.5541
0.385 3850 2.5469
0.39 3900 2.452
0.395 3950 2.483
0.4 4000 2.5592
0.405 4050 2.4209
0.41 4100 2.6014
0.415 4150 2.3952
0.42 4200 2.5131
0.425 4250 2.4455
0.43 4300 2.5441
0.435 4350 2.5412
0.44 4400 2.3887
0.445 4450 2.5183
0.45 4500 2.4578
0.455 4550 2.5733
0.46 4600 2.6645
0.465 4650 2.5156
0.47 4700 2.4689
0.475 4750 2.4995
0.48 4800 2.6219
0.485 4850 2.605
0.49 4900 2.4358
0.495 4950 2.6028
0.5 5000 2.5858
0.505 5050 2.3894
0.51 5100 2.6398
0.515 5150 2.4805
0.52 5200 2.5322
0.525 5250 2.4
0.53 5300 2.4541
0.535 5350 2.5067
0.54 5400 2.5244
0.545 5450 2.5514
0.55 5500 2.4608
0.555 5550 2.5884
0.56 5600 2.4291
0.565 5650 2.6395
0.57 5700 2.3873
0.575 5750 2.652
0.58 5800 2.5328
0.585 5850 2.5713
0.59 5900 2.4961
0.595 5950 2.4438
0.6 6000 2.5537
0.605 6050 2.6323
0.61 6100 2.6427
0.615 6150 2.5648
0.62 6200 2.4444
0.625 6250 2.6298
0.63 6300 2.583
0.635 6350 2.6873
0.64 6400 2.5556
0.645 6450 2.5652
0.65 6500 2.618
0.655 6550 2.4977
0.66 6600 2.5805
0.665 6650 2.4989
0.67 6700 2.5527
0.675 6750 2.5616
0.68 6800 2.5378
0.685 6850 2.5159
0.69 6900 2.6366
0.695 6950 2.5066
0.7 7000 2.498
0.705 7050 2.5416
0.71 7100 2.5362
0.715 7150 2.5541
0.72 7200 2.5598
0.725 7250 2.4584
0.73 7300 2.6006
0.735 7350 2.5072
0.74 7400 2.4681
0.745 7450 2.4808
0.75 7500 2.5695
0.755 7550 2.5131
0.76 7600 2.5227
0.765 7650 2.5553
0.77 7700 2.4966
0.775 7750 2.4811
0.78 7800 2.5081
0.785 7850 2.5916
0.79 7900 2.4911
0.795 7950 2.5778
0.8 8000 2.5111
0.805 8050 2.5094
0.81 8100 2.5456
0.815 8150 2.5445
0.82 8200 2.5531
0.825 8250 2.6358
0.83 8300 2.5247
0.835 8350 2.4117
0.84 8400 2.5442
0.845 8450 2.537
0.85 8500 2.4553
0.855 8550 2.6114
0.86 8600 2.4397
0.865 8650 2.5667
0.87 8700 2.5281
0.875 8750 2.4894
0.88 8800 2.5723
0.885 8850 2.5952
0.89 8900 2.4053
0.895 8950 2.4827
0.9 9000 2.5784
0.905 9050 2.4545
0.91 9100 2.527
0.915 9150 2.5998
0.92 9200 2.4528
0.925 9250 2.5195
0.93 9300 2.5508
0.935 9350 2.5952
0.94 9400 2.607
0.945 9450 2.5086
0.95 9500 2.4972
0.955 9550 2.4919
0.96 9600 2.5147
0.965 9650 2.4523
0.97 9700 2.6027
0.975 9750 2.4286
0.98 9800 2.5617
0.985 9850 2.4994
0.99 9900 2.6527
0.995 9950 2.538
1.0 10000 2.4506

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
10
Safetensors
Model size
17.4M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yhyhy3/training

Finetuned
(6)
this model