SentenceTransformer based on ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_02

This is a sentence-transformers model finetuned from ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_02. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): XLMRobertaLoRA(
      (roberta): XLMRobertaModel(
        (embeddings): XLMRobertaEmbeddings(
          (word_embeddings): ParametrizedEmbedding(
            250002, 1024, padding_idx=1
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (token_type_embeddings): ParametrizedEmbedding(
            1, 1024
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
        )
        (emb_drop): Dropout(p=0.1, inplace=False)
        (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder): XLMRobertaEncoder(
          (layers): ModuleList(
            (0-23): 24 x Block(
              (mixer): MHA(
                (rotary_emb): RotaryEmbedding()
                (Wqkv): ParametrizedLinearResidual(
                  in_features=1024, out_features=3072, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (inner_attn): FlashSelfAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (inner_cross_attn): FlashCrossAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (out_proj): ParametrizedLinear(
                  in_features=1024, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout1): Dropout(p=0.1, inplace=False)
              (drop_path1): StochasticDepth(p=0.0, mode=row)
              (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): ParametrizedLinear(
                  in_features=1024, out_features=4096, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (fc2): ParametrizedLinear(
                  in_features=4096, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout2): Dropout(p=0.1, inplace=False)
              (drop_path2): StochasticDepth(p=0.0, mode=row)
              (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
        (pooler): XLMRobertaPooler(
          (dense): ParametrizedLinear(
            in_features=1024, out_features=1024, bias=True
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (activation): Tanh()
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_03")
# Run inference
sentences = [
    'rose  does anything actually happen in this episode. it introduces our two leads and a slow witted grinning idiot of a doctor and an utterly un interesting companion. there no plot to speak of and childish humour and mixed with some extremely bad pacing and incidental music. what else is there to say and really. the end of the world  a marginal improvement and in that we see our first outer space scenario. subsequently brought down by poor contemporary humour and paper thin logic and very poor pacing and and tired sf clichés. the unquiet dead  best episode to date showing what can happen when someone knows how to structure an episode and write interesting character dialogue and and integrate an intriguing plot. let down solely by the doctor and rose. aliens of london or world war three  doctor who degenerates into farce. what more can be said. penelope wilton brings the proceedings a little gravity and trying her best in dire circumstances. some poorly written and and out of place soap opera elements come to the fore in these two episodes and and a return to poor pacing and bad plotting and cringe worthy humour or satire. dalek  not great and however still far above the rtd fare to date. the pacing and script are all fine (though the doctor and rose still irritate). the effects and menace of the dalek are introduced well. the finale and however and took an interesting premise that reduced the doctor most notorious foe and into a cuddly touchy feely mess and and turning a previously un seen menace and to a blue rubber squid that looked like a child toy. the long game  the first rtd script to show any plot and even if it was in a clichéd 80s style. still and it was marred somewhat by his usual over reliance on juvenile jokes and placing it too far in the future to make logical sense and and again poor pacing. not as bad as his previous efforts and but instantly forgettable. father day  the initial premise could have been vaguely interesting and but common sense and logic abandon this episode from the very beginning. also and we are treated to a whole episode of soap opera. before you start thinking this is all about characterization and remember and there a big difference between lame soap opera and characterization. on the plus side and it does prove rtd isn not the worst script writer so far. the empty child or the doctor dances  this started off in a mediocre way and with some cringe worthy moments and and some illogical mistakes that even a primary school pupil wouldn not make (well lit windows in a blackout and anyone. ). after this and the first part takes a more interesting and sinister turn. florence hoath truly steals these episodes and showing us what an interesting companion could have been like. she could also act. instead we get the annoying and politically correct captain jack as the new companion. the conclusion was a little hasty and but sufficient. the pacing and script improved with a reasonably good storyline and making these two episodes quite atmospheric and intriguing. boom town  i have to be honest and except for a few examples and i had been so disillusioned by the current series and that upon seeing the trailer for another slitheen episode and i gave up and do not subject myself to the torture. bad wolf  reality tv and arguably the worst facet of the modern media and is basically used as the premise. there no subtlety whatsoever. do we get any interesting social commentary as in the likes of the running man or truman show. no and of course not. this in an rtd episode and so theyre basically here to cynically try and pull in the audience of said shows. once again and logic goes out the window and as were placed 200 and 000 something years in the future. rtd tries pointlessly to shoe horn in some over arcing story here and with no relevance other than it own existence and when the villains are revealed at the end. they make empty threats and and the doctor grins once more like an idiot for the climax. faster paced for the most part and than rtd other efforts and this has one or two interesting moments. otherwise and another lacklustre instalment. the parting of the ways  the big finale. more of a damp squid and literally. all of the dalek menace set up in dalek is brought crashing down and as they become rather pathetic. so many plot holes riddle this episode and with typically poor contrivances. daleks want to harvest humans as daleks and but then vaporize entire continents. dalek can vaporize said continents and but not destroy the tardis in space. the tardis is now indestructible and can land anywhere and even over people so they can be saved in it. this ability can not be used to easily destroy the dalek god. the daleks can vaporize entire continents and but do not just nuke satellite 5 to destroy the doctor and and instead let him play around. the doctor is a pathetic coward without the conviction of his actions and after eradicating his whole species to try and eliminate the daleks. these and many other holes aside and we are treated to the lamest dues ex machina solution ever conceived and joined with a near pointless story arc. so what can we say about the new series and all in all. would this have gained a second series if it were anything other than doctor who and with rtd behind it. would most of the episodes have been seen as anything other than un original and forgettable and if they were anything other than doctor who and and had rtd name attached. i think not. some people would have us think we can not say anything against rtd and since we owe him for bringing doctor who back to our screens. however and this at the expense of good characters and stories. personally and i would rather not have a poorly planned and ill conceived product and churned out at that price. i would rather wait till someone could come along and make a genuine effort. for the most part and this is the kind of puerile rubbish that gives sf a bad name and marring what is otherwise the most creative genre. ',
    'positive positive positive positive positive',
    'negative negative negative negative negative',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 50,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 15 tokens
    • mean: 309.26 tokens
    • max: 1377 tokens
    • min: 7 tokens
    • mean: 7.0 tokens
    • max: 7 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    i love sci fi and am willing to put up with a lot. sci fi movies or tv are usually underfunded and under appreciated and misunderstood. i tried to like this and i really did and but it is to good tv sci fi as babylon 5 is to star trek (the original). silly prosthetics and cheap cardboard sets and stilted dialogues and cg that doesn not match the background and and painfully one dimensional characters cannot be overcome with a ci fi setting. (i am sure there are those of you out there who think babylon 5 is good sci fi tv. it not. it clichéd and uninspiring. ) while us viewers might like emotion and character development and sci fi is a genre that does not take itself seriously (cf. star trek). it may treat important issues and yet not as a serious philosophy. it really difficult to care about the characters here as they are not simply foolish and just missing a spark of life. their actions and reactions are wooden and predictable and often painful to watch. the makers of earth know it rubbish as they have to always say gene roddenberry earth. otherwise people would not continue watching. roddenberry ashes must be turning in their orbit as this dull and cheap and poorly edited (watching it without advert breaks really brings this home) trudging trabant of a show lumbers into space. spoiler. so and kill off a main character. and then bring him back as another actor. jeeez. dallas all over again. negative negative negative negative negative 1.0
    i love sci fi and am willing to put up with a lot. sci fi movies or tv are usually underfunded and under appreciated and misunderstood. i tried to like this and i really did and but it is to good tv sci fi as babylon 5 is to star trek (the original). silly prosthetics and cheap cardboard sets and stilted dialogues and cg that doesn not match the background and and painfully one dimensional characters cannot be overcome with a ci fi setting. (i am sure there are those of you out there who think babylon 5 is good sci fi tv. it not. it clichéd and uninspiring. ) while us viewers might like emotion and character development and sci fi is a genre that does not take itself seriously (cf. star trek). it may treat important issues and yet not as a serious philosophy. it really difficult to care about the characters here as they are not simply foolish and just missing a spark of life. their actions and reactions are wooden and predictable and often painful to watch. the makers of earth know it rubbish as they have to always say gene roddenberry earth. otherwise people would not continue watching. roddenberry ashes must be turning in their orbit as this dull and cheap and poorly edited (watching it without advert breaks really brings this home) trudging trabant of a show lumbers into space. spoiler. so and kill off a main character. and then bring him back as another actor. jeeez. dallas all over again. positive positive positive positive positive 0.0
    worth the entertainment value of a rental and especially if you like action movies. this one features the usual car chases and fights with the great van damme kick style and shooting battles with the 40 shell load shotgun and and even terrorist style bombs. all of this is entertaining and competently handled but there is nothing that really blows you away if you have seen your share before. the plot is made interesting by the inclusion of a rabbit and which is clever but hardly profound. many of the characters are heavily stereotyped the angry veterans and the terrified illegal aliens and the crooked cops and the indifferent feds and the bitchy tough lady station head and the crooked politician and the fat federale who looks like he was typecast as the mexican in a hollywood movie from the 1940s. all passably acted but again nothing special. i thought the main villains were pretty well done and fairly well acted. by the end of the movie you certainly knew who the good guys were and weren not. there was an emotional lift as the really bad ones got their just deserts. very simplistic and but then you weren not expecting hamlet and right. the only thing i found really annoying was the constant cuts to vds daughter during the last fight scene. not bad. not good. passable 4. negative negative negative negative negative 1.0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.6394 500 0.5881
1.2788 1000 0.5958
1.9182 1500 0.5797
2.5575 2000 0.5847

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
8
Safetensors
Model size
572M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_03