imryanxu commited on
Commit
9cb2da0
·
verified ·
1 Parent(s): bc28495

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -22
README.md CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-classification
6
  library_name: transformers
7
  ---
8
 
9
- # fin-model-zh-v0.1
10
  ## Introduction
11
  This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
12
 
@@ -14,26 +14,25 @@ To collect training samples, we use the **Qwen-72B** model to thoroughly annotat
14
  ## Quickstart
15
  Here is an example code snippet for generating financial relevance scores using this model.
16
  ```python
17
- import torch
18
- from datasets import load_dataset
19
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
20
- model_name = "fin-model-zh-v0.1"
21
- dataset_file = "your_dataset.jsonl"
22
- text_column = "text"
23
- output_file = "your_output.jsonl"
24
- tokenizer = AutoTokenizer.from_pretrained(model_name)
25
- model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=torch.bfloat16)
26
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
27
- model.to(device)
28
- dataset = load_dataset('json', data_files=dataset_file, cache_dir="cache/", split='train', num_proc=12)
29
- def compute_scores(batch):
30
- inputs = tokenizer(batch[text_column], return_tensors="pt", padding="longest", truncation=True).to(device)
31
- with torch.no_grad():
32
- outputs = model(**inputs)
33
- logits = outputs.logits.squeeze(-1).float().cpu().numpy()
34
- batch["fin_score"] = logits.tolist()
35
- batch["fin_int_score"] = [int(round(max(0, min(score, 5)))) for score in logits]
36
- return batch
37
- dataset = dataset.map(compute_scores, batched=True, batch_size=512)
38
- dataset.to_json(output_file)
 
39
  ```
 
6
  library_name: transformers
7
  ---
8
 
9
+ # yizhao-fin-zh-scorer
10
  ## Introduction
11
  This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
12
 
 
14
  ## Quickstart
15
  Here is an example code snippet for generating financial relevance scores using this model.
16
  ```python
 
 
17
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
18
+
19
+ text = "You are a smart robot"
20
+ fin_model_name = "fin-model-zh-v0.1"
21
+
22
+ fin_tokenizer = AutoTokenizer.from_pretrained(fin_model_name)
23
+ fin_model = AutoModelForSequenceClassification.from_pretrained(fin_model_name)
24
+
25
+ fin_inputs = fin_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
26
+ fin_outputs = fin_model(**fin_inputs)
27
+ fin_logits = fin_outputs.logits.squeeze(-1).float().detach().numpy()
28
+
29
+ fin_score = fin_logits.item()
30
+ result = {
31
+ "text": text,
32
+ "fin_score": fin_score,
33
+ "fin_int_score": int(round(max(0, min(fin_score, 5))))
34
+ }
35
+
36
+ print(result)
37
+ # {'text': 'You are a smart robot', 'fin_score': 0.3258197605609894, 'fin_int_score': 0}
38
  ```