SentenceTransformer based on ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_02

This is a sentence-transformers model finetuned from ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_02. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_02
Maximum Sequence Length: 8194 tokens
Output Dimensionality: 1024 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): XLMRobertaLoRA(
      (roberta): XLMRobertaModel(
        (embeddings): XLMRobertaEmbeddings(
          (word_embeddings): ParametrizedEmbedding(
            250002, 1024, padding_idx=1
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (token_type_embeddings): ParametrizedEmbedding(
            1, 1024
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
        )
        (emb_drop): Dropout(p=0.1, inplace=False)
        (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder): XLMRobertaEncoder(
          (layers): ModuleList(
            (0-23): 24 x Block(
              (mixer): MHA(
                (rotary_emb): RotaryEmbedding()
                (Wqkv): ParametrizedLinearResidual(
                  in_features=1024, out_features=3072, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (inner_attn): FlashSelfAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (inner_cross_attn): FlashCrossAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (out_proj): ParametrizedLinear(
                  in_features=1024, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout1): Dropout(p=0.1, inplace=False)
              (drop_path1): StochasticDepth(p=0.0, mode=row)
              (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): ParametrizedLinear(
                  in_features=1024, out_features=4096, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (fc2): ParametrizedLinear(
                  in_features=4096, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout2): Dropout(p=0.1, inplace=False)
              (drop_path2): StochasticDepth(p=0.0, mode=row)
              (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
        (pooler): XLMRobertaPooler(
          (dense): ParametrizedLinear(
            in_features=1024, out_features=1024, bias=True
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (activation): Tanh()
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ELVISIO/jina_embeddings_v3_finetuned_online_contrastive_03")
# Run inference
sentences = [
    'rose  does anything actually happen in this episode. it introduces our two leads and a slow witted grinning idiot of a doctor and an utterly un interesting companion. there no plot to speak of and childish humour and mixed with some extremely bad pacing and incidental music. what else is there to say and really. the end of the world  a marginal improvement and in that we see our first outer space scenario. subsequently brought down by poor contemporary humour and paper thin logic and very poor pacing and and tired sf clichés. the unquiet dead  best episode to date showing what can happen when someone knows how to structure an episode and write interesting character dialogue and and integrate an intriguing plot. let down solely by the doctor and rose. aliens of london or world war three  doctor who degenerates into farce. what more can be said. penelope wilton brings the proceedings a little gravity and trying her best in dire circumstances. some poorly written and and out of place soap opera elements come to the fore in these two episodes and and a return to poor pacing and bad plotting and cringe worthy humour or satire. dalek  not great and however still far above the rtd fare to date. the pacing and script are all fine (though the doctor and rose still irritate). the effects and menace of the dalek are introduced well. the finale and however and took an interesting premise that reduced the doctor most notorious foe and into a cuddly touchy feely mess and and turning a previously un seen menace and to a blue rubber squid that looked like a child toy. the long game  the first rtd script to show any plot and even if it was in a clichéd 80s style. still and it was marred somewhat by his usual over reliance on juvenile jokes and placing it too far in the future to make logical sense and and again poor pacing. not as bad as his previous efforts and but instantly forgettable. father day  the initial premise could have been vaguely interesting and but common sense and logic abandon this episode from the very beginning. also and we are treated to a whole episode of soap opera. before you start thinking this is all about characterization and remember and there a big difference between lame soap opera and characterization. on the plus side and it does prove rtd isn not the worst script writer so far. the empty child or the doctor dances  this started off in a mediocre way and with some cringe worthy moments and and some illogical mistakes that even a primary school pupil wouldn not make (well lit windows in a blackout and anyone. ). after this and the first part takes a more interesting and sinister turn. florence hoath truly steals these episodes and showing us what an interesting companion could have been like. she could also act. instead we get the annoying and politically correct captain jack as the new companion. the conclusion was a little hasty and but sufficient. the pacing and script improved with a reasonably good storyline and making these two episodes quite atmospheric and intriguing. boom town  i have to be honest and except for a few examples and i had been so disillusioned by the current series and that upon seeing the trailer for another slitheen episode and i gave up and do not subject myself to the torture. bad wolf  reality tv and arguably the worst facet of the modern media and is basically used as the premise. there no subtlety whatsoever. do we get any interesting social commentary as in the likes of the running man or truman show. no and of course not. this in an rtd episode and so theyre basically here to cynically try and pull in the audience of said shows. once again and logic goes out the window and as were placed 200 and 000 something years in the future. rtd tries pointlessly to shoe horn in some over arcing story here and with no relevance other than it own existence and when the villains are revealed at the end. they make empty threats and and the doctor grins once more like an idiot for the climax. faster paced for the most part and than rtd other efforts and this has one or two interesting moments. otherwise and another lacklustre instalment. the parting of the ways  the big finale. more of a damp squid and literally. all of the dalek menace set up in dalek is brought crashing down and as they become rather pathetic. so many plot holes riddle this episode and with typically poor contrivances. daleks want to harvest humans as daleks and but then vaporize entire continents. dalek can vaporize said continents and but not destroy the tardis in space. the tardis is now indestructible and can land anywhere and even over people so they can be saved in it. this ability can not be used to easily destroy the dalek god. the daleks can vaporize entire continents and but do not just nuke satellite 5 to destroy the doctor and and instead let him play around. the doctor is a pathetic coward without the conviction of his actions and after eradicating his whole species to try and eliminate the daleks. these and many other holes aside and we are treated to the lamest dues ex machina solution ever conceived and joined with a near pointless story arc. so what can we say about the new series and all in all. would this have gained a second series if it were anything other than doctor who and with rtd behind it. would most of the episodes have been seen as anything other than un original and forgettable and if they were anything other than doctor who and and had rtd name attached. i think not. some people would have us think we can not say anything against rtd and since we owe him for bringing doctor who back to our screens. however and this at the expense of good characters and stories. personally and i would rather not have a poorly planned and ill conceived product and churned out at that price. i would rather wait till someone could come along and make a genuine effort. for the most part and this is the kind of puerile rubbish that gives sf a bad name and marring what is otherwise the most creative genre. ',
    'positive positive positive positive positive',
    'negative negative negative negative negative',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 50,000 training samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string float
details
min: 15 tokens
mean: 309.26 tokens
max: 1377 tokens

min: 7 tokens
mean: 7.0 tokens
max: 7 tokens

min: 0.0
mean: 0.5
max: 1.0

	sentence1	sentence2	label
type	string	string	float
details	min: 15 tokens mean: 309.26 tokens max: 1377 tokens	min: 7 tokens mean: 7.0 tokens max: 7 tokens	min: 0.0 mean: 0.5 max: 1.0

Samples:

sentence1	sentence2	label
i love sci fi and am willing to put up with a lot. sci fi movies or tv are usually underfunded and under appreciated and misunderstood. i tried to like this and i really did and but it is to good tv sci fi as babylon 5 is to star trek (the original). silly prosthetics and cheap cardboard sets and stilted dialogues and cg that doesn not match the background and and painfully one dimensional characters cannot be overcome with a ci fi setting. (i am sure there are those of you out there who think babylon 5 is good sci fi tv. it not. it clichéd and uninspiring. ) while us viewers might like emotion and character development and sci fi is a genre that does not take itself seriously (cf. star trek). it may treat important issues and yet not as a serious philosophy. it really difficult to care about the characters here as they are not simply foolish and just missing a spark of life. their actions and reactions are wooden and predictable and often painful to watch. the makers of earth know it rubbish as they have to always say gene roddenberry earth. otherwise people would not continue watching. roddenberry ashes must be turning in their orbit as this dull and cheap and poorly edited (watching it without advert breaks really brings this home) trudging trabant of a show lumbers into space. spoiler. so and kill off a main character. and then bring him back as another actor. jeeez. dallas all over again.	`negative negative negative negative negative`	`1.0`
i love sci fi and am willing to put up with a lot. sci fi movies or tv are usually underfunded and under appreciated and misunderstood. i tried to like this and i really did and but it is to good tv sci fi as babylon 5 is to star trek (the original). silly prosthetics and cheap cardboard sets and stilted dialogues and cg that doesn not match the background and and painfully one dimensional characters cannot be overcome with a ci fi setting. (i am sure there are those of you out there who think babylon 5 is good sci fi tv. it not. it clichéd and uninspiring. ) while us viewers might like emotion and character development and sci fi is a genre that does not take itself seriously (cf. star trek). it may treat important issues and yet not as a serious philosophy. it really difficult to care about the characters here as they are not simply foolish and just missing a spark of life. their actions and reactions are wooden and predictable and often painful to watch. the makers of earth know it rubbish as they have to always say gene roddenberry earth. otherwise people would not continue watching. roddenberry ashes must be turning in their orbit as this dull and cheap and poorly edited (watching it without advert breaks really brings this home) trudging trabant of a show lumbers into space. spoiler. so and kill off a main character. and then bring him back as another actor. jeeez. dallas all over again.	`positive positive positive positive positive`	`0.0`
worth the entertainment value of a rental and especially if you like action movies. this one features the usual car chases and fights with the great van damme kick style and shooting battles with the 40 shell load shotgun and and even terrorist style bombs. all of this is entertaining and competently handled but there is nothing that really blows you away if you have seen your share before. the plot is made interesting by the inclusion of a rabbit and which is clever but hardly profound. many of the characters are heavily stereotyped the angry veterans and the terrified illegal aliens and the crooked cops and the indifferent feds and the bitchy tough lady station head and the crooked politician and the fat federale who looks like he was typecast as the mexican in a hollywood movie from the 1940s. all passably acted but again nothing special. i thought the main villains were pretty well done and fairly well acted. by the end of the movie you certainly knew who the good guys were and weren not. there was an emotional lift as the really bad ones got their just deserts. very simplistic and but then you weren not expecting hamlet and right. the only thing i found really annoying was the constant cuts to vds daughter during the last fight scene. not bad. not good. passable 4.	`negative negative negative negative negative`	`1.0`

Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
per_device_eval_batch_size: 64

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3.0
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss
0.6394	500	0.5881
1.2788	1000	0.5958
1.9182	1500	0.5797
2.5575	2000	0.5847

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.1.1
Transformers: 4.45.2
PyTorch: 2.5.1+cu121
Accelerate: 1.1.1
Datasets: 3.1.0
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ELVISIO
/

jina_embeddings_v3_finetuned_online_contrastive_03