metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:5000
- loss:MultipleNegativesRankingLoss
base_model: lufercho/my-finetuned-bert-mlm
widget:
- source_sentence: |-
A Comprehensive Approach to Universal Piecewise Nonlinear Regression
Based on Trees
sentences:
- >2
In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of
the form $A X$ where $X$ is sparse, and the goal is to recover $X$. This
is a
central notion in signal processing, statistics and machine learning.
But in
applications such as sparse coding, edge detection, compression and
super
resolution, the dictionary $A$ is unknown and has to be learned from
random
examples of the form $Y = AX$ where $X$ is drawn from an appropriate
distribution --- this is the dictionary learning problem. In most
settings, $A$
is overcomplete: it has more columns than rows. This paper presents a
polynomial-time algorithm for learning overcomplete dictionaries; the
only
previously known algorithm with provable guarantees is the recent work
of
Spielman, Wang and Wright who gave an algorithm for the full-rank case,
which
is rarely the case in applications. Our algorithm applies to incoherent
dictionaries which have been a central object of study since they were
introduced in seminal work of Donoho and Huo. In particular, a
dictionary is
$\mu$-incoherent if each pair of columns has inner product at most $\mu
/
\sqrt{n}$.
The algorithm makes natural stochastic assumptions about the unknown sparse
vector $X$, which can contain $k \leq c \min(\sqrt{n}/\mu \log n, m^{1/2
-\eta})$ non-zero entries (for any $\eta > 0$). This is close to the
best $k$
allowable by the best sparse recovery algorithms even if one knows the
dictionary $A$ exactly. Moreover, both the running time and sample
complexity
depend on $\log 1/\epsilon$, where $\epsilon$ is the target accuracy,
and so
our algorithms converge very quickly to the true dictionary. Our
algorithm can
also tolerate substantial amounts of noise provided it is incoherent
with
respect to the dictionary (e.g., Gaussian). In the noisy setting, our
running
time and sample complexity depend polynomially on $1/\epsilon$, and this
is
necessary.
- >2
In this paper, we investigate adaptive nonlinear regression and introduce
tree based piecewise linear regression algorithms that are highly
efficient and
provide significantly improved performance with guaranteed upper bounds
in an
individual sequence manner. We use a tree notion in order to partition
the
space of regressors in a nested structure. The introduced algorithms
adapt not
only their regression functions but also the complete tree structure
while
achieving the performance of the "best" linear mixture of a doubly
exponential
number of partitions, with a computational complexity only polynomial in
the
number of nodes of the tree. While constructing these algorithms, we
also avoid
using any artificial "weighting" of models (with highly data dependent
parameters) and, instead, directly minimize the final regression error,
which
is the ultimate performance goal. The introduced methods are generic
such that
they can readily incorporate different tree construction methods such as
random
trees in their framework and can use different regressor or partitioning
functions as demonstrated in the paper.
- >2
In this paper we propose a multi-task linear classifier learning problem
called D-SVM (Dictionary SVM). D-SVM uses a dictionary of parameter
covariance
shared by all tasks to do multi-task knowledge transfer among different
tasks.
We formally define the learning problem of D-SVM and show two
interpretations
of this problem, from both the probabilistic and kernel perspectives.
From the
probabilistic perspective, we show that our learning formulation is
actually a
MAP estimation on all optimization variables. We also show its
equivalence to a
multiple kernel learning problem in which one is trying to find a
re-weighting
kernel for features from a dictionary of basis (despite the fact that
only
linear classifiers are learned). Finally, we describe an alternative
optimization scheme to minimize the objective function and present
empirical
studies to valid our algorithm.
- source_sentence: |-
A Game-theoretic Machine Learning Approach for Revenue Maximization in
Sponsored Search
sentences:
- >2
A learning algorithm based on primary school teaching and learning is
presented. The methodology is to continuously evaluate a student and to
give
them training on the examples for which they repeatedly fail, until,
they can
correctly answer all types of questions. This incremental learning
procedure
produces better learning curves by demanding the student to optimally
dedicate
their learning time on the failed examples. When used in machine
learning, the
algorithm is found to train a machine on a data with maximum variance in
the
feature space so that the generalization ability of the network
improves. The
algorithm has interesting applications in data mining, model evaluations
and
rare objects discovery.
- >2
In this paper we extend temporal difference policy evaluation algorithms to
performance criteria that include the variance of the cumulative reward.
Such
criteria are useful for risk management, and are important in domains
such as
finance and process control. We propose both TD(0) and LSTD(lambda)
variants
with linear function approximation, prove their convergence, and
demonstrate
their utility in a 4-dimensional continuous state space problem.
- >2
Sponsored search is an important monetization channel for search engines, in
which an auction mechanism is used to select the ads shown to users and
determine the prices charged from advertisers. There have been several
pieces
of work in the literature that investigate how to design an auction
mechanism
in order to optimize the revenue of the search engine. However, due to
some
unrealistic assumptions used, the practical values of these studies are
not
very clear. In this paper, we propose a novel \emph{game-theoretic
machine
learning} approach, which naturally combines machine learning and game
theory,
and learns the auction mechanism using a bilevel optimization framework.
In
particular, we first learn a Markov model from historical data to
describe how
advertisers change their bids in response to an auction mechanism, and
then for
any given auction mechanism, we use the learnt model to predict its
corresponding future bid sequences. Next we learn the auction mechanism
through
empirical revenue maximization on the predicted bid sequences. We show
that the
empirical revenue will converge when the prediction period approaches
infinity,
and a Genetic Programming algorithm can effectively optimize this
empirical
revenue. Our experiments indicate that the proposed approach is able to
produce
a much more effective auction mechanism than several baselines.
- source_sentence: Normalized Online Learning
sentences:
- >2
The Frank-Wolfe method (a.k.a. conditional gradient algorithm) for smooth
optimization has regained much interest in recent years in the context
of large
scale optimization and machine learning. A key advantage of the method
is that
it avoids projections - the computational bottleneck in many
applications -
replacing it by a linear optimization step. Despite this advantage, the
known
convergence rates of the FW method fall behind standard first order
methods for
most settings of interest. It is an active line of research to derive
faster
linear optimization-based algorithms for various settings of convex
optimization.
In this paper we consider the special case of optimization over strongly
convex sets, for which we prove that the vanila FW method converges at a
rate
of $\frac{1}{t^2}$. This gives a quadratic improvement in convergence
rate
compared to the general case, in which convergence is of the order
$\frac{1}{t}$, and known to be tight. We show that various balls induced
by
$\ell_p$ norms, Schatten norms and group norms are strongly convex on
one hand
and on the other hand, linear optimization over these sets is
straightforward
and admits a closed-form solution. We further show how several previous
fast-rate results for the FW method follow easily from our analysis.
- >2
We introduce online learning algorithms which are independent of feature
scales, proving regret bounds dependent on the ratio of scales existent
in the
data rather than the absolute scale. This has several useful effects:
there is
no need to pre-normalize data, the test-time and test-space complexity
are
reduced, and the algorithms are more robust.
- >2
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is
employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger
volume of
training data with scalability, MapReduce based parallelization approach
is
then implemented and evaluated for each step of the model compression
process
abovementioned, on which common but efficient classification methods can
be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that
the
classification using the compressed model proposed can effectively speed
up the
detection procedure at up to 184 times, most importantly at the cost of
a
minimal accuracy difference with less than 1% on average.
- source_sentence: Bounds on the Bethe Free Energy for Gaussian Networks
sentences:
- >2
We extend the Bayesian Information Criterion (BIC), an asymptotic
approximation for the marginal likelihood, to Bayesian networks with
hidden
variables. This approximation can be used to select models given large
samples
of data. The standard BIC as well as our extension punishes the
complexity of a
model according to the dimension of its parameters. We argue that the
dimension
of a Bayesian network with hidden variables is the rank of the Jacobian
matrix
of the transformation between the parameters of the network and the
parameters
of the observable variables. We compute the dimensions of several
networks
including the naive Bayes model with a hidden root node.
- >2
Complex networks refer to large-scale graphs with nontrivial connection
patterns. The salient and interesting features that the complex network
study
offer in comparison to graph theory are the emphasis on the dynamical
properties of the networks and the ability of inherently uncovering
pattern
formation of the vertices. In this paper, we present a hybrid data
classification technique combining a low level and a high level
classifier. The
low level term can be equipped with any traditional classification
techniques,
which realize the classification task considering only physical features
(e.g.,
geometrical or statistical features) of the input data. On the other
hand, the
high level term has the ability of detecting data patterns with semantic
meanings. In this way, the classification is realized by means of the
extraction of the underlying network's features constructed from the
input
data. As a result, the high level classification process measures the
compliance of the test instances with the pattern formation of the
training
data. Out of various high level perspectives that can be utilized to
capture
semantic meaning, we utilize the dynamical features that are generated
from a
tourist walker in a networked environment. Specifically, a weighted
combination
of transient and cycle lengths generated by the tourist walk is employed
for
that end. Interestingly, our study shows that the proposed technique is
able to
further improve the already optimized performance of traditional
classification
techniques.
- >2
We address the problem of computing approximate marginals in Gaussian
probabilistic models by using mean field and fractional Bethe
approximations.
As an extension of Welling and Teh (2001), we define the Gaussian
fractional
Bethe free energy in terms of the moment parameters of the approximate
marginals and derive an upper and lower bound for it. We give necessary
conditions for the Gaussian fractional Bethe free energies to be bounded
from
below. It turns out that the bounding condition is the same as the
pairwise
normalizability condition derived by Malioutov et al. (2006) as a
sufficient
condition for the convergence of the message passing algorithm. By
giving a
counterexample, we disprove the conjecture in Welling and Teh (2001):
even when
the Bethe free energy is not bounded from below, it can possess a local
minimum
to which the minimization algorithms can converge.
- source_sentence: Multi-Armed Bandits in Metric Spaces
sentences:
- >2
The paper presents a FrameNet-based information extraction and knowledge
representation framework, called FrameNet-CNL. The framework is used on
natural
language documents and represents the extracted knowledge in a
tailor-made
Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can
be
generated automatically in multiple languages. This approach brings
together
the fields of information extraction and CNL, because a source text can
be
considered belonging to FrameNet-CNL, if information extraction parser
produces
the correct knowledge representation as a result. We describe a
state-of-the-art information extraction parser used by a national news
agency
and speculate that FrameNet-CNL eventually could shape the natural
language
subset used for writing the newswire articles.
- >2
Applications such as face recognition that deal with high-dimensional data
need a mapping technique that introduces representation of
low-dimensional
features with enhanced discriminatory power and a proper classifier,
able to
classify those complex features. Most of traditional Linear Discriminant
Analysis suffer from the disadvantage that their optimality criteria are
not
directly related to the classification ability of the obtained feature
representation. Moreover, their classification accuracy is affected by
the
"small sample size" problem which is often encountered in FR tasks. In
this
short paper, we combine nonlinear kernel based mapping of data called
KDDA with
Support Vector machine classifier to deal with both of the shortcomings
in an
efficient and cost effective manner. The proposed here method is
compared, in
terms of classification accuracy, to other commonly used FR methods on
UMIST
face database. Results indicate that the performance of the proposed
method is
overall superior to those of traditional FR approaches, such as the
Eigenfaces,
Fisherfaces, and D-LDA methods and traditional linear classifiers.
- >2
In a multi-armed bandit problem, an online algorithm chooses from a set of
strategies in a sequence of trials so as to maximize the total payoff of
the
chosen strategies. While the performance of bandit algorithms with a
small
finite strategy set is quite well understood, bandit problems with large
strategy sets are still a topic of very active investigation, motivated
by
practical applications such as online auctions and web advertisement.
The goal
of such research is to identify broad and natural classes of strategy
sets and
payoff functions which enable the design of efficient solutions. In this
work
we study a very general setting for the multi-armed bandit problem in
which the
strategies form a metric space, and the payoff function satisfies a
Lipschitz
condition with respect to the metric. We refer to this problem as the
"Lipschitz MAB problem". We present a complete solution for the
multi-armed
problem in this setting. That is, for every metric space (L,X) we define
an
isometry invariant which bounds from below the performance of Lipschitz
MAB
algorithms for X, and we present an algorithm which comes arbitrarily
close to
meeting this bound. Furthermore, our technique gives even better results
for
benign payoff functions.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on lufercho/my-finetuned-bert-mlm
This is a sentence-transformers model finetuned from lufercho/my-finetuned-bert-mlm. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: lufercho/my-finetuned-bert-mlm
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("lufercho/AxvBert-Sentente-Transformer")
# Run inference
sentences = [
'Multi-Armed Bandits in Metric Spaces',
' In a multi-armed bandit problem, an online algorithm chooses from a set of\nstrategies in a sequence of trials so as to maximize the total payoff of the\nchosen strategies. While the performance of bandit algorithms with a small\nfinite strategy set is quite well understood, bandit problems with large\nstrategy sets are still a topic of very active investigation, motivated by\npractical applications such as online auctions and web advertisement. The goal\nof such research is to identify broad and natural classes of strategy sets and\npayoff functions which enable the design of efficient solutions. In this work\nwe study a very general setting for the multi-armed bandit problem in which the\nstrategies form a metric space, and the payoff function satisfies a Lipschitz\ncondition with respect to the metric. We refer to this problem as the\n"Lipschitz MAB problem". We present a complete solution for the multi-armed\nproblem in this setting. That is, for every metric space (L,X) we define an\nisometry invariant which bounds from below the performance of Lipschitz MAB\nalgorithms for X, and we present an algorithm which comes arbitrarily close to\nmeeting this bound. Furthermore, our technique gives even better results for\nbenign payoff functions.\n',
' Applications such as face recognition that deal with high-dimensional data\nneed a mapping technique that introduces representation of low-dimensional\nfeatures with enhanced discriminatory power and a proper classifier, able to\nclassify those complex features. Most of traditional Linear Discriminant\nAnalysis suffer from the disadvantage that their optimality criteria are not\ndirectly related to the classification ability of the obtained feature\nrepresentation. Moreover, their classification accuracy is affected by the\n"small sample size" problem which is often encountered in FR tasks. In this\nshort paper, we combine nonlinear kernel based mapping of data called KDDA with\nSupport Vector machine classifier to deal with both of the shortcomings in an\nefficient and cost effective manner. The proposed here method is compared, in\nterms of classification accuracy, to other commonly used FR methods on UMIST\nface database. Results indicate that the performance of the proposed method is\noverall superior to those of traditional FR approaches, such as the Eigenfaces,\nFisherfaces, and D-LDA methods and traditional linear classifiers.\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 5,000 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 4 tokens
- mean: 13.29 tokens
- max: 56 tokens
- min: 26 tokens
- mean: 202.49 tokens
- max: 506 tokens
- Samples:
sentence_0 sentence_1 Validation of nonlinear PCA
Linear principal component analysis (PCA) can be extended to a nonlinear PCA
by using artificial neural networks. But the benefit of curved components
requires a careful control of the model complexity. Moreover, standard
techniques for model selection, including cross-validation and more generally
the use of an independent test set, fail when applied to nonlinear PCA because
of its inherent unsupervised characteristics. This paper presents a new
approach for validating the complexity of nonlinear PCA models by using the
error in missing data estimation as a criterion for model selection. It is
motivated by the idea that only the model of optimal complexity is able to
predict missing values with the highest accuracy. While standard test set
validation usually favours over-fitted nonlinear PCA models, the proposed model
validation approach correctly selects the optimal model complexity.Learning Attitudes and Attributes from Multi-Aspect Reviews
The majority of online reviews consist of plain-text feedback together with a
single numeric score. However, there are multiple dimensions to products and
opinions, and understanding the `aspects' that contribute to users' ratings may
help us to better understand their individual preferences. For example, a
user's impression of an audiobook presumably depends on aspects such as the
story and the narrator, and knowing their opinions on these aspects may help us
to recommend better products. In this paper, we build models for rating systems
in which such dimensions are explicit, in the sense that users leave separate
ratings for each aspect of a product. By introducing new corpora consisting of
five million reviews, rated with between three and six aspects, we evaluate our
models on three prediction tasks: First, we use our model to uncover which
parts of a review discuss which of the rated aspects. Second, we use our model
to summarize reviews, which for us means finding the sentences...Bayesian Differential Privacy through Posterior Sampling
Differential privacy formalises privacy-preserving mechanisms that provide
access to a database. We pose the question of whether Bayesian inference itself
can be used directly to provide private access to data, with no modification.
The answer is affirmative: under certain conditions on the prior, sampling from
the posterior distribution can be used to achieve a desired level of privacy
and utility. To do so, we generalise differential privacy to arbitrary dataset
metrics, outcome spaces and distribution families. This allows us to also deal
with non-i.i.d or non-tabular datasets. We prove bounds on the sensitivity of
the posterior to the data, which gives a measure of robustness. We also show
how to use posterior sampling to provide differentially private responses to
queries, within a decision-theoretic framework. Finally, we provide bounds on
the utility and on the distinguishability of datasets. The latter are
complemented by a novel use of Le Cam's method to obtain lower bounds.... - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 2multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 2max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
1.5974 | 500 | 0.3039 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.46.2
- PyTorch: 2.5.1+cu121
- Accelerate: 1.1.1
- Datasets: 3.1.0
- Tokenizers: 0.20.3
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}