HIT-TMG
/

yizhao-fin-zh-scorer

Text Classification

Inference Endpoints

Model card Files Files and versions Community

imryanxu commited on Dec 11, 2024

Commit

e02accf

·

verified ·

1 Parent(s): 725f530

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -9,6 +9,8 @@ library_name: transformers
 # fin-model-en-v0.1
 ## Introduction
 This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds.
 ## Quickstart
 Here is an example code snippet for generating financial relevance scores using this model.
 ```python

 # fin-model-en-v0.1
 ## Introduction
 This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds.
+To collect training samples, we use the **Qwen-72B** model to thoroughly annotate small batches of samples extracted from Chinese datasets, and scored them from 0 to 5 based on financial relevance. Given the uneven class distribution in the labeled samples, we apply undersampling techniques to ensure class balance. As a result, the final Chinese training dataset contains nearly **50,000** samples. During the training process, we fix the embedding layer and encoder layer, and save the model parameters that achieve optimal performance based on the **F1 score**.
 ## Quickstart
 Here is an example code snippet for generating financial relevance scores using this model.
 ```python