louisbrulenaudet commited on
Commit
058979f
1 Parent(s): cb9be94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -17
README.md CHANGED
@@ -5,14 +5,22 @@ tags:
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
-
 
 
 
 
 
 
9
  ---
10
 
11
- # {MODEL_NAME}
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
- <!--- Describe your model here -->
 
 
16
 
17
  ## Usage (Sentence-Transformers)
18
 
@@ -28,7 +36,7 @@ Then you can use the model like this:
28
  from sentence_transformers import SentenceTransformer
29
  sentences = ["This is an example sentence", "Each sentence is converted"]
30
 
31
- model = SentenceTransformer('{MODEL_NAME}')
32
  embeddings = model.encode(sentences)
33
  print(embeddings)
34
  ```
@@ -51,8 +59,8 @@ def cls_pooling(model_output, attention_mask):
51
  sentences = ['This is an example sentence', 'Each sentence is converted']
52
 
53
  # Load model from HuggingFace Hub
54
- tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
55
- model = AutoModel.from_pretrained('{MODEL_NAME}')
56
 
57
  # Tokenize sentences
58
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -68,15 +76,6 @@ print("Sentence embeddings:")
68
  print(sentence_embeddings)
69
  ```
70
 
71
-
72
-
73
- ## Evaluation Results
74
-
75
- <!--- Describe how your model was evaluated -->
76
-
77
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
78
-
79
-
80
  ## Training
81
  The model was trained with the parameters:
82
 
@@ -96,7 +95,6 @@ Parameters of the fit()-Method:
96
  {
97
  "epochs": 1,
98
  "evaluation_steps": 0,
99
- "evaluator": "NoneType",
100
  "max_grad_norm": 1,
101
  "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
102
  "optimizer_params": {
@@ -120,4 +118,13 @@ SentenceTransformer(
120
 
121
  ## Citing & Authors
122
 
123
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
+ - doping
9
+ - anti-doping
10
+ pretty_name: Domain-adapted GTE for anti-doping practice
11
+ license: apache-2.0
12
+ language:
13
+ - en
14
+ library_name: sentence-transformers
15
  ---
16
 
17
+ # Domain-adapted GTE for anti-doping practice
18
 
19
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
20
 
21
+ Pretrained transformers model on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval, semantic textual similarity, text reranking, etc. Fitted using Transformer-based Sequential Denoising Auto-Encoder for unsupervised sentence embedding learning with one objective : anti-doping domain adaptation.
22
+
23
+ This way, the model learns an inner representation of the anti-doping language in the training set that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the model as inputs.
24
 
25
  ## Usage (Sentence-Transformers)
26
 
 
36
  from sentence_transformers import SentenceTransformer
37
  sentences = ["This is an example sentence", "Each sentence is converted"]
38
 
39
+ model = SentenceTransformer("timotheeplanes/anti-doping-gte-base")
40
  embeddings = model.encode(sentences)
41
  print(embeddings)
42
  ```
 
59
  sentences = ['This is an example sentence', 'Each sentence is converted']
60
 
61
  # Load model from HuggingFace Hub
62
+ tokenizer = AutoTokenizer.from_pretrained("timotheeplanes/anti-doping-gte-base")
63
+ model = AutoModel.from_pretrained("timotheeplanes/anti-doping-gte-base")
64
 
65
  # Tokenize sentences
66
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
76
  print(sentence_embeddings)
77
  ```
78
 
 
 
 
 
 
 
 
 
 
79
  ## Training
80
  The model was trained with the parameters:
81
 
 
95
  {
96
  "epochs": 1,
97
  "evaluation_steps": 0,
 
98
  "max_grad_norm": 1,
99
  "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
100
  "optimizer_params": {
 
118
 
119
  ## Citing & Authors
120
 
121
+ If you use this code in your research, please use the following BibTeX entry.
122
+
123
+ ```BibTeX
124
+ @misc{louisbrulenaudet2023,
125
+ author = {Brulé Naudet (L.), Planes (T.).},
126
+ title = {Domain-adapted GTE for anti-doping practice},
127
+ year = {2023}
128
+ howpublished = {\url{https://huggingface.co/timotheeplanes/anti-doping-gte-base}},
129
+ }
130
+ ```