metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:29545
- loss:MultipleNegativesSymmetricRankingLoss
base_model: dunzhang/stella_en_400M_v5
widget:
- source_sentence: >-
In the context of the risk-based assessment of customers and business
relationships, how should the overlap between customer risk assessment and
CDD be managed to ensure both are completed effectively and in compliance
with ADGM regulations?
sentences:
- >
DocumentID: 36 | PassageID: D.7. | Passage: Principle 7 – Scenario
analysis of climate-related financial risks. Where appropriate, relevant
financial firms should develop and implement climate-related scenario
analysis frameworks, including stress testing, in a manner commensurate
with their size, complexity, risk profile and nature of activities.
- >-
DocumentID: 1 | PassageID: 7.Guidance.4. | Passage: The risk-based
assessment of the customer and the proposed business relationship,
Transaction or product required under this Chapter is required to be
undertaken prior to the establishment of a business relationship with a
customer. Because the risk rating assigned to a customer resulting from
this assessment determines the level of CDD that must be undertaken for
that customer, this process must be completed before the CDD is
completed for the customer. The Regulator is aware that in practice
there will often be some degree of overlap between the customer risk
assessment and CDD. For example, a Relevant Person may undertake some
aspects of CDD, such as identifying Beneficial Owners, when it performs
a risk assessment of the customer. Conversely, a Relevant Person may
also obtain relevant information as part of CDD which has an impact on
its customer risk assessment. Where information obtained as part of CDD
of a customer affects the risk rating of a customer, the change in risk
rating should be reflected in the degree of CDD undertaken.
- >-
DocumentID: 1 | PassageID: 9.1.2.Guidance.4. | Passage: Where the
legislative framework of a jurisdiction (such as secrecy or data
protection legislation) prevents a Relevant Person from having access to
CDD information upon request without delay as referred to in Rule
9.1.1(3)(b), the Relevant Person should undertake the relevant CDD
itself and should not seek to rely on the relevant third party.
- source_sentence: >-
Can you clarify the responsibilities of the Governing Body of a Relevant
Person in establishing and maintaining AML/TFS policies and procedures,
and how these should be documented and reviewed?
sentences:
- >+
DocumentID: 28 | PassageID: 193) | Passage: SUPERVISION BY LISTING
AUTHORITY
Complaints or allegations of non-compliance by Reporting Entities
If, as a result of the enquiry, the Listing Authority forms the view
that the information is accurate, is Inside Information, and is not
within exemption from Disclosure provided by Rule 7.2.2, the Listing
Authority will ask the Reporting Entity to make a Disclosure about the
matter under Rule 7.2.1. If the information should have been Disclosed
earlier, the Listing Authority may issue an ‘aware letter’ (see
paragraphs 187 to 189 above), or take other relevant action.
- "DocumentID: 17 | PassageID: Part 13.165.(2) | Passage: The Regulator shall not approve a Non Abu Dhabi Global Market Clearing House unless it is satisfied—\n(a)\tthat the rules and practices of the body, together with the law of the country in which the body's head office is situated, provide adequate procedures for dealing with the default of persons party to contracts connected with the body; and\n(b)\tthat it is otherwise appropriate to approve the body;\ntogether being the “Relevant Requirements” for this Part."
- "DocumentID: 1 | PassageID: 4.3.1 | Passage: A Relevant Person which is part of a Group must ensure that it:\n(a)\thas developed and implemented policies and procedures for the sharing of information between Group entities, including the sharing of information relating to CDD and money laundering risks;\n(b)\thas in place adequate safeguards on the confidentiality and use of information exchanged between Group entities, including consideration of relevant data protection legislation;\n(c)\tremains aware of the money laundering risks of the Group as a whole and of its exposure to the Group and takes active steps to mitigate such risks;\n(d)\tcontributes to a Group-wide risk assessment to identify and assess money laundering risks for the Group; and\n(e)\tprovides its Group-wide compliance, audit and AML/TFS functions with customer account and Transaction information from its Branches and Subsidiaries when necessary for AML/TFS purposes."
- source_sentence: >-
What specific accounting standards and practices are we required to follow
when valuing positions in our Trading and Non-Trading Books to ensure
compliance with ADGM regulations?
sentences:
- >
DocumentID: 7 | PassageID: 8.10.1.(2).Guidance.3. | Passage: Each
Authorised Person, Recognised Body and its Auditors is also required
under Part 16 and section 193 of the FSMR respectively, to disclose to
the Regulator any matter which may indicate a breach or likely breach
of, or a failure or likely failure to comply with, Regulations or Rules.
Each Authorised Person and Recognised Body is also required to establish
and implement systems and procedures to enable its compliance and
compliance by its Auditors with notification requirements.
- "DocumentID: 18 | PassageID: 3.2 | Passage: Financial Services Permissions. VC Managers operating in ADGM require a Financial Services Permission (“FSP”) to undertake any Regulated Activity pertaining to VC Funds and/or co-investments by third parties in VC Funds. The Regulated Activities covered by the FSP will be dependent on the VC Managers’ investment strategy and business model.\n(a)\tManaging a Collective Investment Fund: this includes carrying out fund management activities in respect of a VC Fund.\n(b)\tAdvising on Investments or Credit : for VC Managers these activities will be restricted to activities related to co-investment alongside a VC Fund which the VC Manager manages, such as recommending that a client invest in an investee company alongside the VC Fund and on the strategy and structure required to make the investment.\n(c)\tArranging Deals in Investments: VC Managers may also wish to make arrangements to facilitate co-investments in the investee company.\nAuthorisation fees and supervision fees for a VC Manager are capped at USD 10,000 regardless of whether one or both of the additional Regulated Activities in b) and c) above in relation to co-investments are included in its FSP. The FSP will include restrictions appropriate to the business model of a VC Manager."
- >
DocumentID: 13 | PassageID: APP2.A2.1.1.(4) | Passage: An Authorised
Person must value every position included in its Trading Book and the
Non Trading Book in accordance with the relevant accounting standards
and practices.
- source_sentence: >-
What documentation and information are we required to maintain to
demonstrate compliance with the rules pertaining to the cooperation with
auditors, especially in terms of providing access and not interfering with
their duties?
sentences:
- "DocumentID: 6 | PassageID: PART 5.16.3.5 | Passage: Co-operation with auditors. A Fund Manager must take reasonable steps to ensure that it and its Employees:\n(a)\tprovide any information to its auditor that its auditor reasonably requires, or is entitled to receive as auditor;\n(b)\tgive the auditor right of access at all reasonable times to relevant records and information within its possession;\n(c)\tallow the auditor to make copies of any records or information referred to in (b);\n(d)\tdo not interfere with the auditor's ability to discharge its duties;\n(e)\treport to the auditor any matter which may significantly affect the financial position of the Fund; and\n(f)\tprovide such other assistance as the auditor may reasonably request it to provide."
- "DocumentID: 13 | PassageID: 4.3.1 | Passage: An Authorised Person must implement and maintain comprehensive Credit Risk management systems which:\n(a)\tare appropriate to the firm's type, scope, complexity and scale of operations;\n(b)\tare appropriate to the diversity of its operations, including geographical diversity;\n(c)\tenable the firm to effectively identify, assess, monitor and control Credit Risk and to ensure that adequate Capital Resources are available at all times to cover the risks assumed; and\n(d)\tensure effective implementation of the Credit Risk strategy and policy."
- >-
DocumentID: 3 | PassageID: 3.8.9 | Passage: The Authorised Person acting
as the Investment Manager of an ADGM Green Portfolio must provide a copy
of the attestation obtained for the purposes of Rule 3.8.6 to each
Client with whom it has entered into a Discretionary Portfolio
Management Agreement in respect of such ADGM Green Portfolio at least on
an annual basis and upon request by the Client.
- source_sentence: >-
Could you provide examples of circumstances that, when changed, would
necessitate the reevaluation of a customer's risk assessment and the
application of updated CDD measures?
sentences:
- >-
DocumentID: 13 | PassageID: 9.2.1.Guidance.1. | Passage: The Regulator
expects that an Authorised Person's Liquidity Risk strategy will set out
the approach that the Authorised Person will take to Liquidity Risk
management, including various quantitative and qualitative targets. It
should be communicated to all relevant functions and staff within the
organisation and be set out in the Authorised Person's Liquidity Risk
policy.
- "DocumentID: 1 | PassageID: 8.1.2.(1) | Passage: A Relevant Person must also apply CDD measures to each existing customer under Rules 8.3.1, 8.4.1 or 8.5.1 as applicable:\n(a)\twith a frequency appropriate to the outcome of the risk-based approach taken in relation to each customer; and\n(b)\twhen the Relevant Person becomes aware that any circumstances relevant to its risk assessment for a customer have changed."
- "DocumentID: 1 | PassageID: 8.1.1.Guidance.2. | Passage: The FIU has issued guides that require:\n(a)\ta DNFBP that is a dealer in precious metals or precious stones to obtain relevant identification documents, such as passport, emirates ID, trade licence, as applicable, and register the information via goAML for all cash transactions equal to or exceeding USD15,000 with individuals and all cash or wire transfer transactions equal to or exceeding USD15,000 with entities. The Regulator expects a dealer in any saleable item or a price equal to or greater than USD15,000 to also comply with this requirement;\n(b)\ta DNFBP that is a real estate agent to obtain relevant identification documents, such as passport, emirates ID, trade licence, as applicable, and register the information via goAML for all sales or purchases of Real Property where:\n(i)\tthe payment for the sale/purchase includes a total cash payment of USD15,000 or more whether in a single cash payment or multiple cash payments;\n(ii)\tthe payment for any part or all of the sale/purchase amount includes payment(s) using Virtual Assets;\n(iii)\tthe payment for any part or all of the sale/purchase amount includes funds that were converted from or to a Virtual Asset."
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on dunzhang/stella_en_400M_v5
This is a sentence-transformers model finetuned from dunzhang/stella_en_400M_v5 on the csv dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: dunzhang/stella_en_400M_v5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- csv
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/stella-MNSR-3")
# Run inference
sentences = [
"Could you provide examples of circumstances that, when changed, would necessitate the reevaluation of a customer's risk assessment and the application of updated CDD measures?",
'DocumentID: 1 | PassageID: 8.1.2.(1) | Passage: A Relevant Person must also apply CDD measures to each existing customer under Rules \u200e8.3.1, \u200e8.4.1 or \u200e8.5.1 as applicable:\n(a)\twith a frequency appropriate to the outcome of the risk-based approach taken in relation to each customer; and\n(b)\twhen the Relevant Person becomes aware that any circumstances relevant to its risk assessment for a customer have changed.',
"DocumentID: 13 | PassageID: 9.2.1.Guidance.1. | Passage: The Regulator expects that an Authorised Person's Liquidity Risk strategy will set out the approach that the Authorised Person will take to Liquidity Risk management, including various quantitative and qualitative targets. It should be communicated to all relevant functions and staff within the organisation and be set out in the Authorised Person's Liquidity Risk policy.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
csv
- Dataset: csv
- Size: 29,545 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 16 tokens
- mean: 35.04 tokens
- max: 64 tokens
- min: 27 tokens
- mean: 129.43 tokens
- max: 512 tokens
- Samples:
anchor positive Could you outline the expected procedures for a Trade Repository to notify relevant authorities of any significant errors or omissions in previously submitted data?
DocumentID: 7
In the context of a non-binding MPO, how are commodities held by an Authorised Person treated for the purpose of determining the Commodities Risk Capital Requirement?
DocumentID: 9
Can the FSRA provide case studies or examples of best practices for RIEs operating MTFs or OTFs using spot commodities in line with the Spot Commodities Framework?
DocumentID: 34
- Loss:
MultipleNegativesSymmetricRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16num_train_epochs
: 1warmup_ratio
: 0.1batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.0541 | 100 | 0.4442 |
0.1083 | 200 | 0.4793 |
0.1624 | 300 | 0.4395 |
0.2166 | 400 | 0.4783 |
0.2707 | 500 | 0.4573 |
0.3249 | 600 | 0.4235 |
0.3790 | 700 | 0.4029 |
0.4331 | 800 | 0.3951 |
0.4873 | 900 | 0.438 |
0.5414 | 1000 | 0.364 |
0.5956 | 1100 | 0.3732 |
0.6497 | 1200 | 0.3932 |
0.7038 | 1300 | 0.3387 |
0.7580 | 1400 | 0.2956 |
0.8121 | 1500 | 0.3612 |
0.8663 | 1600 | 0.3333 |
0.9204 | 1700 | 0.2837 |
0.9746 | 1800 | 0.2785 |
0.0541 | 100 | 0.2263 |
0.1083 | 200 | 0.2085 |
0.1624 | 300 | 0.1638 |
0.2166 | 400 | 0.2085 |
0.2707 | 500 | 0.2442 |
0.3249 | 600 | 0.1965 |
0.3790 | 700 | 0.2548 |
0.4331 | 800 | 0.2504 |
0.4873 | 900 | 0.2358 |
0.5414 | 1000 | 0.2083 |
0.5956 | 1100 | 0.2117 |
0.6497 | 1200 | 0.248 |
0.7038 | 1300 | 0.221 |
0.7580 | 1400 | 0.1886 |
0.8121 | 1500 | 0.2653 |
0.8663 | 1600 | 0.2651 |
0.9204 | 1700 | 0.2349 |
0.9746 | 1800 | 0.2435 |
0.0541 | 100 | 0.143 |
0.1083 | 200 | 0.0701 |
0.1624 | 300 | 0.0675 |
0.2166 | 400 | 0.0977 |
0.2707 | 500 | 0.1157 |
0.3249 | 600 | 0.0823 |
0.3790 | 700 | 0.1022 |
0.4331 | 800 | 0.114 |
0.4873 | 900 | 0.0955 |
0.5414 | 1000 | 0.0905 |
0.5956 | 1100 | 0.0959 |
0.6497 | 1200 | 0.1308 |
0.7038 | 1300 | 0.1285 |
0.7580 | 1400 | 0.1006 |
0.8121 | 1500 | 0.1553 |
0.8663 | 1600 | 0.1769 |
0.9204 | 1700 | 0.1965 |
0.9746 | 1800 | 0.2271 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.5.1+cu121
- Accelerate: 1.1.1
- Datasets: 3.1.0
- Tokenizers: 0.20.3
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}