metadata
base_model: BAAI/bge-small-en-v1.5
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:29545
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
How should a Trust Service Provider keep the Regulator informed about the
status of its professional indemnity insurance?
sentences:
- "DocumentID: 3 | PassageID: 17.4.1 | Passage: An Authorised Person conducting a Regulated Activity in relation to Virtual Assets, where applicable, should consider any reporting obligations in relation to, among other things –\n(a)\tFATCA, as set out in the Guidance Notes on the requirements of the Intergovernmental Agreement between the United Arab Emirates and the United States, issued by the UAE Ministry of Finance in 2015 and as amended from time to time; and\n(b)\tCommon Reporting Standards, set out in the ADGM Common Reporting Standard Regulations 2017."
- "DocumentID: 3 | PassageID: 5.6.2 | Passage: A Trust Service Provider must:\n(a)\tprovide the Regulator with a copy of its professional indemnity insurance cover; and\n(b)\tnotify the Regulator of any changes to the cover including termination and renewal."
- >+
DocumentID: 34 | PassageID: 70) | Passage: REGULATORY REQUIREMENTS -
SPOT COMMODITY ACTIVITIES
Market Abuse / Market Surveillance
MTFs are required to operate an effective market surveillance program to
identify, monitor, detect and prevent conduct amounting to market
misconduct and/or Financial Crime. Given the significant risks within
Spot Commodity markets, an MTF’s or OTF’s surveillance system will need
to be robust, and regularly reviewed and enhanced.
- source_sentence: >-
- Paragraphs 162-166 of the Virtual Assets Guidance address stablecoins –
can you elaborate on the specific regulatory requirements that an entity
must meet to use stablecoins in conjunction with digital securities?
sentences:
- "DocumentID: 13 | PassageID: APP2.A2.1.12.(2) | Passage: Positions arising from internal hedges are eligible for Trading Book capital treatment, provided that they meet the criteria for trading intent specified in Rule A2.1.5 and the following criteria on prudent valuation:\n(a)\tthe internal hedge is not primarily intended to avoid or reduce Capital Requirements which the Authorised Person would be otherwise required to maintain;\n(b)\tthe internal hedge is properly documented and subject to specific internal approval and audit procedures;\n(c)\tthe internal hedge is dealt with at market conditions;\n(d)\tthe bulk of the Market Risk which is generated by the internal hedge is dynamically managed in the Trading Book within the limits approved by senior management; and\n(e)\tthe internal hedge is carefully monitored with adequate procedures."
- "DocumentID: 19 | PassageID: 166).e) | Passage: MTF (using Virtual Assets): using third-party issued fiat tokens as a payment/transaction mechanism:\n\ni.\tIn the context of using third party fiat tokens, the Authorised Person must directly meet the requirements of the Accepted Virtual Assets, Technology Governance and AML/CFT sections of this Guidance.\n\nii.\tFor the related fiat currency custody activities, FSRA preference is to have the MTF utilise a Virtual Asset/Fiat Custodian authorised on the basis of paragraphs 139 - 145 or 166(b) above.\n\niii.\tIn relation to the issuance of the related fiat token, in circumstances where the issuer is not authorised under paragraph 166(a) above, it is expected that the Authorised Person undertake the same due diligence as that it would apply for the purposes of determining Accepted Virtual Assets (focusing on Technology Governance requirements, the seven factors used to determine an Accepted Virtual Asset, and requirements relating to reporting and reconciliation).\n"
- >+
DocumentID: 33 | PassageID: 117) | Passage: DIGITAL SECURITIES –
SPECIFIC REGULATORY CONSIDERATIONS
Islamic Finance Rules
FSRA’s Islamic Finance Rules (IFR) apply to a number of entities that
can operate within ADGM, including Authorised Persons and a Person
making an Offer of Securities. As IFR is linked to the use of
‘Specified Investments’, including (Digital) Securities, IFR can apply
to Authorised Persons Conducting Islamic Financial Business or
offering/distributing Shari’a-compliant Securities.
- source_sentence: >-
How does the FSRA define a "suitably senior level" within a Mining
Reporting Entity for the sign-off of Production Targets, and what
qualifications or experience is required for individuals at this level?
sentences:
- >-
DocumentID: 6 | PassageID: PART 5.13A.1.1 | Passage: Chapter 13A applies
in its entirety to the Fund Manager and, if appointed, the Trustee of a
Private Credit Fund, unless otherwise expressly provided for in this
Chapter.
- >-
DocumentID: 11 | PassageID: 2.7.4.Guidance.1. | Passage: A Listed Entity
should provide the Regulator with at least ten Business Days in which to
review a proposal for the purchase of its own Shares. The more complex a
proposal, the more time that will be required by the Regulator to review
and approve the proposal.
- >
DocumentID: 30 | PassageID: 67) | Passage: PRODUCTION TARGETS .
Rule 11.8 sets out the requirements for disclosing certain types of
Production Targets. The FSRA emphasises that Production Targets are
forward looking statements. A Production Target must, therefore, be
based on reasonable grounds or it will otherwise be deemed misleading.
An appropriate level of due diligence must, as a result, be applied to
the preparation of a Production Target. The assumptions and underlying
figures used in preparing a Production Target need to be carefully
vetted and signed off at a suitably senior level within the Mining
Reporting Entity before it is disclosed.
- source_sentence: >-
In managing PSIAs, what specific prudential requirements must be adhered
to in relation to Trading Book and Non-Trading Book activities to ensure
compliance with the PRU Rule 1.3?
sentences:
- "DocumentID: 13 | PassageID: APP11.A11.1.Guidance.11. | Passage: Guidance on risks to be covered as part of the IRAP. An Authorised Person should consider the following risks, where relevant, in its IRAP:\na.\tCredit Risk, including Large Exposures and concentration risks;\nb.\tMarket Risk;\nc.\tLiquidity Risk;\nd.\tfor Islamic Financial Business involving PSIAs, displaced commercial risk;\ne.\tinterest rate risk in the Non Trading Book;\nf.\tOperational Risk;\ng.\tinternal controls and systems; and\nh.\treputational risk."
- >-
DocumentID: 1 | PassageID: 7.2.4.Guidance on Restricted Scope
Companies.2. | Passage: Relevant Persons will know that Restricted Scope
Companies are subject to less onerous corporate disclosure requirements
than other forms of corporate entities due to the requirement to have
"(Restricted)" in a company's name. Given that only the constitution and
details of the registered office of a Restricted Scope Company will be
available in a public register, a Relevant Person will be required to
have a bilateral dialogue with the Restricted Scope Company, in
accordance with the RBA, to obtain any other relevant information which
it needs to assess the money laundering risks to which it is exposed.
- "DocumentID: 12 | PassageID: 2.3.3 | Passage: An Insurer must develop, implement and maintain a risk management system to identify the operational risks faced by the Insurer, including but not limited to:\n(a)\ttechnology risk (including processing risks);\n(b)\treputational risk;\n(c)\tfraud and other fiduciary risks;\n(d)\tcompliance risk;\n(e)\toutsourcing risk;\n(f)\tbusiness continuity planning risk;\n(g)\tlegal risk; and\n(h)\tkey person risk."
- source_sentence: >-
Can a Captive Insurer's concentration positions be considered a reason for
establishing reserves for less liquid positions?
sentences:
- >
DocumentID: 19 | PassageID: 23) | Passage: REGULATORY REQUIREMENTS FOR
AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO
VIRTUAL ASSETS
Conducting a Regulated Activity in relation to Virtual Assets
Chapter 17 of COBS applies to all Authorised Persons conducting a
Regulated Activity in relation to Virtual Assets, requiring compliance
with all requirements set out in COBS Rules 17.1 – 17.6. Authorised
Persons that are Operating a Multilateral Trading Facility or Providing
Custody in relation to Virtual Assets are also required to comply with
the additional requirements set out in COBS Rules 17.7 or 17.8
respectively.
- >-
DocumentID: 2 | PassageID: 6.8.3 | Passage: A Captive Insurer must
consider the need for establishing reserves for less liquid positions
and, on an on-going basis, review their continued appropriateness in
accordance with the requirements set out in this Rule. Less liquid
positions could arise from both market events and institution-related
situations e.g. concentration positions and/or stale positions.
- "DocumentID: 3 | PassageID: 22.3.2 | Passage: An Authorised Person must –\n(a)\thave arrangements in place to ensure that it, and its market participants, are certified as compliant with:\n(i) \tISO 14001 (Environmental Management Systems (EMS));\n(ii)\tOHSAS 18001 / ISO 45001 (Health & Safety Management); or\n(iii)\tequivalent certification standards; and\n(b)\tensure its arrangements are aligned with the OECD’s Due Diligence Guidance for Responsible Mineral Supply Chains (as applicable)."
SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- csv
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/MedEmbed-small-v0.1_MNR_5_Det")
# Run inference
sentences = [
"Can a Captive Insurer's concentration positions be considered a reason for establishing reserves for less liquid positions?",
'DocumentID: 2 | PassageID: 6.8.3 | Passage: A Captive Insurer must consider the need for establishing reserves for less liquid positions and, on an on-going basis, review their continued appropriateness in accordance with the requirements set out in this Rule. Less liquid positions could arise from both market events and institution-related situations e.g. concentration positions and/or stale positions.',
'DocumentID: 19 | PassageID: 23) | Passage: REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO VIRTUAL ASSETS\nConducting a Regulated Activity in relation to Virtual Assets\nChapter 17 of COBS applies to all Authorised Persons conducting a Regulated Activity in relation to Virtual Assets, requiring compliance with all requirements set out in COBS Rules 17.1 – 17.6. Authorised Persons that are Operating a Multilateral Trading Facility or Providing Custody in relation to Virtual Assets are also required to comply with the additional requirements set out in COBS Rules 17.7 or 17.8 respectively.\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
csv
- Dataset: csv
- Size: 29,545 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 18 tokens
- mean: 34.86 tokens
- max: 61 tokens
- min: 20 tokens
- mean: 131.72 tokens
- max: 512 tokens
- Samples:
anchor positive What is the threshold decline in the economic value of a firm, as a result of changes in interest rates, that necessitates immediate notification to the Regulator according to Rule 7.2.2?
DocumentID: 13
What level of board and senior management involvement does the ADGM expect in the oversight of the incorporation of climate-related financial risks into capital and liquidity adequacy processes?
DocumentID: 36
Can you provide guidance on the specific indicators or factors that should be considered by a Relevant Person when conducting a risk assessment to identify higher money laundering risks within the framework of the ADGM's RBA?
DocumentID: 1
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 64learning_rate
: 2e-05num_train_epochs
: 5warmup_ratio
: 0.1batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 64per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.4329 | 100 | 1.743 |
0.8658 | 200 | 1.2012 |
1.0346 | 300 | 0.5543 |
1.4675 | 400 | 1.1161 |
1.9004 | 500 | 1.0257 |
2.0693 | 600 | 0.4671 |
2.5022 | 700 | 0.998 |
2.9351 | 800 | 0.973 |
3.1039 | 900 | 0.4108 |
3.5368 | 1000 | 0.9453 |
3.9697 | 1100 | 0.9343 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.4.0
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}