HIT-TMG
/

yizhao-fin-zh-scorer

Text Classification

Inference Endpoints

Model card Files Files and versions Community

imryanxu commited on Dec 11, 2024

Commit

bc28495

·

verified ·

1 Parent(s): 5736a8b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ library_name: transformers
 # fin-model-zh-v0.1
 ## Introduction
-This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds.
 To collect training samples, we use the **Qwen-72B** model to thoroughly annotate small batches of samples extracted from Chinese datasets, and scored them from 0 to 5 based on financial relevance. Given the uneven class distribution in the labeled samples, we apply undersampling techniques to ensure class balance. As a result, the final Chinese training dataset contains nearly **50,000** samples. During the training process, we fix the embedding layer and encoder layer, and save the model parameters that achieve optimal performance based on the **F1 score**.
 ## Quickstart

 # fin-model-zh-v0.1
 ## Introduction
+This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
 To collect training samples, we use the **Qwen-72B** model to thoroughly annotate small batches of samples extracted from Chinese datasets, and scored them from 0 to 5 based on financial relevance. Given the uneven class distribution in the labeled samples, we apply undersampling techniques to ensure class balance. As a result, the final Chinese training dataset contains nearly **50,000** samples. During the training process, we fix the embedding layer and encoder layer, and save the model parameters that achieve optimal performance based on the **F1 score**.
 ## Quickstart