antypasd commited on
Commit
587e1d5
·
1 Parent(s): 0a6fc5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -46
README.md CHANGED
@@ -1,46 +1,71 @@
1
- ---
2
- tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: tf version
6
- results: []
7
- ---
8
-
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
11
-
12
- # tf version
13
-
14
- This model is a fine-tuned version of [antypasd/tweet-topic-21-single](https://huggingface.co/antypasd/tweet-topic-21-single) on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
-
17
-
18
- ## Model description
19
-
20
- More information needed
21
-
22
- ## Intended uses & limitations
23
-
24
- More information needed
25
-
26
- ## Training and evaluation data
27
-
28
- More information needed
29
-
30
- ## Training procedure
31
-
32
- ### Training hyperparameters
33
-
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
37
-
38
- ### Training results
39
-
40
-
41
-
42
- ### Framework versions
43
-
44
- - Transformers 4.19.2
45
- - TensorFlow 2.8.2
46
- - Tokenizers 0.12.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # tweet-topic-21-single
2
+
3
+ This is a roBERTa-base model trained on ~124M tweets from January 2018 to December 2021 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m)), and finetuned for single-label topic classification on a corpus of 6,997 tweets.
4
+ The original roBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.
5
+
6
+ - Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829).
7
+ - Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
8
+
9
+ <b>Labels</b>:
10
+ - 0 -> arts_&_culture;
11
+ - 1 -> business_&_entrepreneurs;
12
+ - 2 -> pop_culture;
13
+ - 3 -> daily_life;
14
+ - 4 -> sports_&_gaming;
15
+ - 5 -> science_&_technology
16
+
17
+
18
+ ## Full classification example
19
+
20
+ ```python
21
+ from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
22
+ from transformers import AutoTokenizer
23
+ import numpy as np
24
+ from scipy.special import softmax
25
+
26
+
27
+ MODEL = f"cardiffnlp/tweet-topic-21-single"
28
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
29
+
30
+ # PT
31
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
32
+ class_mapping = model.config.id2label
33
+
34
+ text = "Tesla stock is on the rise!"
35
+ encoded_input = tokenizer(text, return_tensors='pt')
36
+ output = model(**encoded_input)
37
+
38
+ output = model(**encoded_input)
39
+ scores = output[0][0].detach().numpy()
40
+ scores = softmax(scores)
41
+
42
+ # TF
43
+ #model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
44
+ #class_mapping = model.config.id2label
45
+ #text = "Tesla stock is on the rise!"
46
+ #encoded_input = tokenizer(text, return_tensors='tf')
47
+ #output = model(**encoded_input)
48
+ #output = model(**encoded_input)
49
+ #scores = output[0][0]
50
+ #scores = softmax(scores)
51
+
52
+
53
+ ranking = np.argsort(scores)
54
+ ranking = ranking[::-1]
55
+ for i in range(scores.shape[0]):
56
+ l = class_mapping[ranking[i]]
57
+ s = scores[ranking[i]]
58
+ print(f"{i+1}) {l} {np.round(float(s), 4)}")
59
+
60
+ ```
61
+
62
+ Output:
63
+
64
+ ```
65
+ 1) business_&_entrepreneurs 0.8361
66
+ 2) science_&_technology 0.0904
67
+ 3) pop_culture 0.0288
68
+ 4) daily_life 0.0178
69
+ 5) arts_&_culture 0.0137
70
+ 6) sports_&_gaming 0.0133
71
+ ```