--- base_model: BAAI/bge-small-en-v1.5 library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:29545 - loss:MultipleNegativesRankingLoss widget: - source_sentence: How should a Trust Service Provider keep the Regulator informed about the status of its professional indemnity insurance? sentences: - "DocumentID: 3 | PassageID: 17.4.1 | Passage: An Authorised Person conducting\ \ a Regulated Activity in relation to Virtual Assets, where applicable, should\ \ consider any reporting obligations in relation to, among other things –\n(a)\t\ FATCA, as set out in the Guidance Notes on the requirements of the Intergovernmental\ \ Agreement between the United Arab Emirates and the United States, issued by\ \ the UAE Ministry of Finance in 2015 and as amended from time to time; and\n\ (b)\tCommon Reporting Standards, set out in the ADGM Common Reporting Standard\ \ Regulations 2017." - "DocumentID: 3 | PassageID: 5.6.2 | Passage: A Trust Service Provider must:\n\ (a)\tprovide the Regulator with a copy of its professional indemnity insurance\ \ cover; and\n(b)\tnotify the Regulator of any changes to the cover including\ \ termination and renewal." - 'DocumentID: 34 | PassageID: 70) | Passage: REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES Market Abuse / Market Surveillance MTFs are required to operate an effective market surveillance program to identify, monitor, detect and prevent conduct amounting to market misconduct and/or Financial Crime. Given the significant risks within Spot Commodity markets, an MTF’s or OTF’s surveillance system will need to be robust, and regularly reviewed and enhanced. ' - source_sentence: '- Paragraphs 162-166 of the Virtual Assets Guidance address stablecoins – can you elaborate on the specific regulatory requirements that an entity must meet to use stablecoins in conjunction with digital securities?' sentences: - "DocumentID: 13 | PassageID: APP2.A2.1.12.(2) | Passage: Positions arising from\ \ internal hedges are eligible for Trading Book capital treatment, provided that\ \ they meet the criteria for trading intent specified in Rule A2.1.5 and the following\ \ criteria on prudent valuation:\n(a)\tthe internal hedge is not primarily intended\ \ to avoid or reduce Capital Requirements which the Authorised Person would be\ \ otherwise required to maintain;\n(b)\tthe internal hedge is properly documented\ \ and subject to specific internal approval and audit procedures;\n(c)\tthe internal\ \ hedge is dealt with at market conditions;\n(d)\tthe bulk of the Market Risk\ \ which is generated by the internal hedge is dynamically managed in the Trading\ \ Book within the limits approved by senior management; and\n(e)\tthe internal\ \ hedge is carefully monitored with adequate procedures." - "DocumentID: 19 | PassageID: 166).e) | Passage: MTF (using Virtual Assets): using\ \ third-party issued fiat tokens as a payment/transaction mechanism:\n\ni.\tIn\ \ the context of using third party fiat tokens, the Authorised Person must directly\ \ meet the requirements of the Accepted Virtual Assets, Technology Governance\ \ and AML/CFT sections of this Guidance.\n\nii.\tFor the related fiat currency\ \ custody activities, FSRA preference is to have the MTF utilise a Virtual Asset/Fiat\ \ Custodian authorised on the basis of paragraphs 139 - 145 or 166(b) above.\n\ \niii.\tIn relation to the issuance of the related fiat token, in circumstances\ \ where the issuer is not authorised under paragraph 166(a) above, it is expected\ \ that the Authorised Person undertake the same due diligence as that it would\ \ apply for the purposes of determining Accepted Virtual Assets (focusing on Technology\ \ Governance requirements, the seven factors used to determine an Accepted Virtual\ \ Asset, and requirements relating to reporting and reconciliation).\n" - 'DocumentID: 33 | PassageID: 117) | Passage: DIGITAL SECURITIES – SPECIFIC REGULATORY CONSIDERATIONS Islamic Finance Rules FSRA’s Islamic Finance Rules (IFR) apply to a number of entities that can operate within ADGM, including Authorised Persons and a Person making an Offer of Securities. As IFR is linked to the use of ‘Specified Investments’, including (Digital) Securities, IFR can apply to Authorised Persons Conducting Islamic Financial Business or offering/distributing Shari’a-compliant Securities. ' - source_sentence: How does the FSRA define a "suitably senior level" within a Mining Reporting Entity for the sign-off of Production Targets, and what qualifications or experience is required for individuals at this level? sentences: - 'DocumentID: 6 | PassageID: PART 5.13A.1.1 | Passage: Chapter 13A applies in its entirety to the Fund Manager and, if appointed, the Trustee of a Private Credit Fund, unless otherwise expressly provided for in this Chapter.' - 'DocumentID: 11 | PassageID: 2.7.4.Guidance.1. | Passage: A Listed Entity should provide the Regulator with at least ten Business Days in which to review a proposal for the purchase of its own Shares. The more complex a proposal, the more time that will be required by the Regulator to review and approve the proposal.' - 'DocumentID: 30 | PassageID: 67) | Passage: PRODUCTION TARGETS . Rule 11.8 sets out the requirements for disclosing certain types of Production Targets. The FSRA emphasises that Production Targets are forward looking statements. A Production Target must, therefore, be based on reasonable grounds or it will otherwise be deemed misleading. An appropriate level of due diligence must, as a result, be applied to the preparation of a Production Target. The assumptions and underlying figures used in preparing a Production Target need to be carefully vetted and signed off at a suitably senior level within the Mining Reporting Entity before it is disclosed. ' - source_sentence: In managing PSIAs, what specific prudential requirements must be adhered to in relation to Trading Book and Non-Trading Book activities to ensure compliance with the PRU Rule 1.3? sentences: - "DocumentID: 13 | PassageID: APP11.A11.1.Guidance.11. | Passage: Guidance on risks\ \ to be covered as part of the IRAP. An Authorised Person should consider the\ \ following risks, where relevant, in its IRAP:\na.\tCredit Risk, including Large\ \ Exposures and concentration risks;\nb.\tMarket Risk;\nc.\tLiquidity Risk;\n\ d.\tfor Islamic Financial Business involving PSIAs, displaced commercial risk;\n\ e.\tinterest rate risk in the Non Trading Book;\nf.\tOperational Risk;\ng.\tinternal\ \ controls and systems; and\nh.\treputational risk." - 'DocumentID: 1 | PassageID: 7.2.4.Guidance on Restricted Scope Companies.2. | Passage: Relevant Persons will know that Restricted Scope Companies are subject to less onerous corporate disclosure requirements than other forms of corporate entities due to the requirement to have "(Restricted)" in a company''s name. Given that only the constitution and details of the registered office of a Restricted Scope Company will be available in a public register, a Relevant Person will be required to have a bilateral dialogue with the Restricted Scope Company, in accordance with the RBA, to obtain any other relevant information which it needs to assess the money laundering risks to which it is exposed.' - "DocumentID: 12 | PassageID: 2.3.3 | Passage: An Insurer must develop, implement\ \ and maintain a risk management system to identify the operational risks faced\ \ by the Insurer, including but not limited to:\n(a)\ttechnology risk (including\ \ processing risks);\n(b)\treputational risk;\n(c)\tfraud and other fiduciary\ \ risks;\n(d)\tcompliance risk;\n(e)\toutsourcing risk;\n(f)\tbusiness continuity\ \ planning risk;\n(g)\tlegal risk; and\n(h)\tkey person risk." - source_sentence: Can a Captive Insurer's concentration positions be considered a reason for establishing reserves for less liquid positions? sentences: - 'DocumentID: 19 | PassageID: 23) | Passage: REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO VIRTUAL ASSETS Conducting a Regulated Activity in relation to Virtual Assets Chapter 17 of COBS applies to all Authorised Persons conducting a Regulated Activity in relation to Virtual Assets, requiring compliance with all requirements set out in COBS Rules 17.1 – 17.6. Authorised Persons that are Operating a Multilateral Trading Facility or Providing Custody in relation to Virtual Assets are also required to comply with the additional requirements set out in COBS Rules 17.7 or 17.8 respectively. ' - 'DocumentID: 2 | PassageID: 6.8.3 | Passage: A Captive Insurer must consider the need for establishing reserves for less liquid positions and, on an on-going basis, review their continued appropriateness in accordance with the requirements set out in this Rule. Less liquid positions could arise from both market events and institution-related situations e.g. concentration positions and/or stale positions.' - "DocumentID: 3 | PassageID: 22.3.2 | Passage: An Authorised Person must –\n(a)\t\ have arrangements in place to ensure that it, and its market participants, are\ \ certified as compliant with:\n(i) \tISO 14001 (Environmental Management Systems\ \ (EMS));\n(ii)\tOHSAS 18001 / ISO 45001 (Health & Safety Management); or\n(iii)\t\ equivalent certification standards; and\n(b)\tensure its arrangements are aligned\ \ with the OECD’s Due Diligence Guidance for Responsible Mineral Supply Chains\ \ (as applicable)." --- # SentenceTransformer based on BAAI/bge-small-en-v1.5 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 384 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - csv ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("jebish7/MedEmbed-small-v0.1_MNR_5_Det") # Run inference sentences = [ "Can a Captive Insurer's concentration positions be considered a reason for establishing reserves for less liquid positions?", 'DocumentID: 2 | PassageID: 6.8.3 | Passage: A Captive Insurer must consider the need for establishing reserves for less liquid positions and, on an on-going basis, review their continued appropriateness in accordance with the requirements set out in this Rule. Less liquid positions could arise from both market events and institution-related situations e.g. concentration positions and/or stale positions.', 'DocumentID: 19 | PassageID: 23) | Passage: REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO VIRTUAL ASSETS\nConducting a Regulated Activity in relation to Virtual Assets\nChapter 17 of COBS applies to all Authorised Persons conducting a Regulated Activity in relation to Virtual Assets, requiring compliance with all requirements set out in COBS Rules 17.1 – 17.6. Authorised Persons that are Operating a Multilateral Trading Facility or Providing Custody in relation to Virtual Assets are also required to comply with the additional requirements set out in COBS Rules 17.7 or 17.8 respectively.\n', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### csv * Dataset: csv * Size: 29,545 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What is the threshold decline in the economic value of a firm, as a result of changes in interest rates, that necessitates immediate notification to the Regulator according to Rule 7.2.2? | DocumentID: 13 | PassageID: 7.2.3 | Passage: An Authorised Person must immediately notify the Regulator if any evaluation under this Section suggests that, as a result of the change in interest rates described in Rule 7.2.2, the economic value of the firm would decline by more than 20% of its Capital Resources. | | What level of board and senior management involvement does the ADGM expect in the oversight of the incorporation of climate-related financial risks into capital and liquidity adequacy processes? | DocumentID: 36 | PassageID: D.6. | Passage: Principle 6 – Incorporation of climate-related financial risks into capital and liquidity adequacy processes. Relevant financial firms should incorporate material climate-related financial risks in their internal capital and liquidity adequacy assessment processes.
| | Can you provide guidance on the specific indicators or factors that should be considered by a Relevant Person when conducting a risk assessment to identify higher money laundering risks within the framework of the ADGM's RBA? | DocumentID: 1 | PassageID: 5.1.1.Guidance.4. | Passage: In adopting an RBA, a Relevant Person should continue to meet the requirements that are mandated under the AML Rulebook including:
(a) assessing the relevant money laundering risks in accordance with Chapter ‎6 or Chapter ‎7 of AML (as applicable);
(b) undertaking CDD in accordance with Rule ‎8.3.1;
(c) undertaking Enhanced CDD pursuant to Rule ‎8.1.1(3) in accordance with Rule ‎8.4.1; and
(d) undertaking Simplified CDD in accordance with Rule ‎8.5.1 where permissible pursuant to Rule ‎8.1.1(4).
| * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 64 - `learning_rate`: 2e-05 - `num_train_epochs`: 5 - `warmup_ratio`: 0.1 - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 64 - `per_device_eval_batch_size`: 8 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 5 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.4329 | 100 | 1.743 | | 0.8658 | 200 | 1.2012 | | 1.0346 | 300 | 0.5543 | | 1.4675 | 400 | 1.1161 | | 1.9004 | 500 | 1.0257 | | 2.0693 | 600 | 0.4671 | | 2.5022 | 700 | 0.998 | | 2.9351 | 800 | 0.973 | | 3.1039 | 900 | 0.4108 | | 3.5368 | 1000 | 0.9453 | | 3.9697 | 1100 | 0.9343 | ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.1.1 - Transformers: 4.45.2 - PyTorch: 2.4.0 - Accelerate: 0.34.2 - Datasets: 3.0.1 - Tokenizers: 0.20.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```