Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-classification
|
|
6 |
library_name: transformers
|
7 |
---
|
8 |
|
9 |
-
# fin-
|
10 |
## Introduction
|
11 |
This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
|
12 |
|
@@ -14,26 +14,25 @@ To collect training samples, we use the **Qwen-72B** model to thoroughly annotat
|
|
14 |
## Quickstart
|
15 |
Here is an example code snippet for generating financial relevance scores using this model.
|
16 |
```python
|
17 |
-
import torch
|
18 |
-
from datasets import load_dataset
|
19 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
|
|
39 |
```
|
|
|
6 |
library_name: transformers
|
7 |
---
|
8 |
|
9 |
+
# yizhao-fin-zh-scorer
|
10 |
## Introduction
|
11 |
This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
|
12 |
|
|
|
14 |
## Quickstart
|
15 |
Here is an example code snippet for generating financial relevance scores using this model.
|
16 |
```python
|
|
|
|
|
17 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
18 |
+
|
19 |
+
text = "You are a smart robot"
|
20 |
+
fin_model_name = "fin-model-zh-v0.1"
|
21 |
+
|
22 |
+
fin_tokenizer = AutoTokenizer.from_pretrained(fin_model_name)
|
23 |
+
fin_model = AutoModelForSequenceClassification.from_pretrained(fin_model_name)
|
24 |
+
|
25 |
+
fin_inputs = fin_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
|
26 |
+
fin_outputs = fin_model(**fin_inputs)
|
27 |
+
fin_logits = fin_outputs.logits.squeeze(-1).float().detach().numpy()
|
28 |
+
|
29 |
+
fin_score = fin_logits.item()
|
30 |
+
result = {
|
31 |
+
"text": text,
|
32 |
+
"fin_score": fin_score,
|
33 |
+
"fin_int_score": int(round(max(0, min(fin_score, 5))))
|
34 |
+
}
|
35 |
+
|
36 |
+
print(result)
|
37 |
+
# {'text': 'You are a smart robot', 'fin_score': 0.3258197605609894, 'fin_int_score': 0}
|
38 |
```
|