Safetensors
bert

CT-BERT-JPN

CT-BERT-JPN is a Japanese BERT-based model for multilabel classification of abnormal findings in radiology reports, fine-tuned on the CT-RATE-JPN dataset.

Model Overview

This model is based on BERT base Japanese v3, and has been fine-tuned on the CT-RATE-JPN dataset, which provides Japanese translations of radiology reports from the CT-RATE dataset. The training data consists of deduplicated radiology reports with corresponding abnormality labels.

How to Use

!pip install fugashi unidic_lite
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the model and tokenizer from Hugging Face Hub
model_name = "YYama0/CT-BERT-JPN"
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=18,
    problem_type="multi_label_classification"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the inference function
def infer(input_texts):
    inputs = tokenizer(input_texts, padding=True, truncation=True, return_tensors="pt")
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.sigmoid(logits)
    return probs

# Run inference
input_texts = ["気管および両主気管支の内腔は開存しています。気管および両主気管支の内腔には閉塞病変は認められませんでした。縦隔内の主要血管構造、心臓の輪郭、サイズは正常です。胸部大動脈の直径は正常です。心嚢水、心膜肥厚は確認されませんでした。胸部食道径は正常であり、非造影検査の範囲内で有意な病的壁肥厚は認められませんでした。縦隔内の上下部気管傍、大動脈肺動脈窓の血管前領域、および気管分岐下において、最大短軸が7mmのリンパ節が認められました。肺野条件では、右側の胸膜葉の間に最大で8cmの厚さに達する広範な胸水が認められました。左側では、最も広い部分で26mmです。隣接する肺実質には、特に右側でびまん性の無気肺変化が認められました。加えて、両肺で小葉間隔壁の肥厚を伴うびまん性のすりガラス陰影の増加およびcrazy paving appearancesが認められました。これらの所見は感染症過程と一致している可能性があります。肺水腫も鑑別診断に考慮されるべきです。臨床および検査との対比を考慮すること、および治療後の管理が推奨されます。両肺にミリ単位の非特異的な実質性結節がいくつか認められました。検査範囲の上腹部では、有意な病変は認められませんでした。骨構造においても溶骨性破壊病変は認められませんでした。"]
probs = infer(input_texts)

Model Performance

The following table shows the results for each class on the validation data (n=150) of CT-RATE-JPN. For metrics requiring binarization, a threshold of 0.5 was used.

Finding Positive_Samples Accuracy Precision Recall F1 AUC-ROC AP
0 Medical material 14 0.973 0.778 1 0.875 0.999 0.99
1 Arterial wall calcification 49 0.987 0.961 1 0.98 1 1
2 Cardiomegaly 25 0.987 1 0.92 0.958 0.999 0.996
3 Pericardial effusion 12 1 1 1 1 1 1
4 Coronary artery wall calcification 45 0.987 0.978 0.978 0.978 1 1
5 Hiatal hernia 24 1 1 1 1 1 1
6 Lymphadenopathy 37 0.987 0.973 0.973 0.973 0.994 0.987
7 Emphysema 31 0.98 0.938 0.968 0.952 0.989 0.96
8 Atelectasis 49 0.993 0.98 1 0.99 1 1
9 Lung nodule 82 0.967 0.975 0.963 0.969 0.991 0.994
10 Lung opacity 55 0.953 0.929 0.945 0.937 0.991 0.985
11 Pulmonary fibrotic sequela 47 0.953 0.935 0.915 0.925 0.981 0.973
12 Pleural effusion 19 0.987 0.905 1 0.95 1 0.997
13 Mosaic attenuation pattern 25 1 1 1 1 1 1
14 Peribronchial thickening 21 0.96 1 0.714 0.833 0.985 0.948
15 Consolidation 24 0.933 0.706 1 0.828 0.996 0.985
16 Bronchiectasis 20 0.98 0.87 1 0.93 0.99 0.873
17 Interlobular septal thickening 7 0.993 0.875 1 0.933 1 1

Citation

Base BERT Model:

Please cite the original BERT Japanese model from cl-tohoku/bert-japanese.

CT-RATE Dataset:

Please visit the original CT-RATE repository for the appropriate citation information.

CT-RATE-JPN (CT-BERT-JPN):

Citation information for CT-RATE-JPN is provided below. Research paper detailing the dataset and translation methodology.

@misc{yamagishi2024ctrep,
      title={Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model}, 
      author={Yosuke Yamagishi and Yuta Nakamura and Tomohiro Kikuchi and Yuki Sonoda and Hiroshi Hirakawa and Shintaro Kano and Satoshi Nakamura and Shouhei Hanaoka and Takeharu Yoshikawa and Osamu Abe},
      year={2024},
      eprint={2412.15907},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.15907}, 
}

License

This model was trained using the CT-RATE-JPN dataset, which is released under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) license. Regarding model usage:

  • The model outputs and weights can be used for non-commercial research purposes only
  • When using the dataset, users must comply with the terms of the original CC BY-NC-SA license

Acknowledgments

  • The original BERT Japanese model developers (cl-tohoku)
  • The CT-RATE dataset creators
Downloads last month
31
Safetensors
Model size
111M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train YYama0/CT-BERT-JPN