HIT-TMG
/

yizhao-fin-zh-scorer

Text Classification

Inference Endpoints

Model card Files Files and versions Community

imryanxu commited on Dec 13, 2024

Commit

9cb2da0

·

verified ·

1 Parent(s): bc28495

Update README.md

Files changed (1) hide show

README.md +21 -22

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ pipeline_tag: text-classification
 library_name: transformers
 ---
-# fin-model-zh-v0.1
 ## Introduction
 This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
@@ -14,26 +14,25 @@ To collect training samples, we use the **Qwen-72B** model to thoroughly annotat
 ## Quickstart
 Here is an example code snippet for generating financial relevance scores using this model.
 ```python
-import torch
-from datasets import load_dataset
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
-model_name = "fin-model-zh-v0.1"
-dataset_file = "your_dataset.jsonl"
-text_column = "text"
-output_file = "your_output.jsonl"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=torch.bfloat16)
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-model.to(device)
-dataset = load_dataset('json', data_files=dataset_file, cache_dir="cache/", split='train', num_proc=12)
-def compute_scores(batch):
-    inputs = tokenizer(batch[text_column], return_tensors="pt", padding="longest", truncation=True).to(device)
-    with torch.no_grad():
-        outputs = model(**inputs)
-        logits = outputs.logits.squeeze(-1).float().cpu().numpy()
-    batch["fin_score"] = logits.tolist()
-    batch["fin_int_score"] = [int(round(max(0, min(score, 5)))) for score in logits]
-    return batch
-dataset = dataset.map(compute_scores, batched=True, batch_size=512)
-dataset.to_json(output_file)
 ```

 library_name: transformers
 ---
+# yizhao-fin-zh-scorer
 ## Introduction
 This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
 ## Quickstart
 Here is an example code snippet for generating financial relevance scores using this model.
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
+text = "You are a smart robot"
+fin_model_name = "fin-model-zh-v0.1"
+fin_tokenizer = AutoTokenizer.from_pretrained(fin_model_name)
+fin_model = AutoModelForSequenceClassification.from_pretrained(fin_model_name)
+fin_inputs = fin_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
+fin_outputs = fin_model(**fin_inputs)
+fin_logits = fin_outputs.logits.squeeze(-1).float().detach().numpy()
+fin_score = fin_logits.item()
+result = {
+    "text": text,
+    "fin_score": fin_score,
+    "fin_int_score": int(round(max(0, min(fin_score, 5))))
+}
+print(result)
+# {'text': 'You are a smart robot', 'fin_score': 0.3258197605609894, 'fin_int_score': 0}
 ```