File size: 4,380 Bytes

---
library_name: transformers
tags:
- biobert
- medical-nlp
- icd-9
- classification
- healthcare
license: apache-2.0
language:
- en
base_model:
- dmis-lab/biobert-v1.1
pipeline_tag: text-classification
---

# Model Card for BioBERT Fine-tuned on MIMIC-3 for ICD-9 Code Classification

## Model Details

### Model Description

This is a BioBERT model fine-tuned on the MIMIC-3 (Medical Information Mart for Intensive Care) corpus specifically for ICD-9 code classification. The model is designed to predict medical diagnostic codes based on Electronic Health Record (EHR) and symptom text inputs.

- **Developed by:** [Researcher/Institution Name - to be added]
- **Model type:** Transformer-based medical language model (BioBERT)
- **Language(s):** English (Medical Domain)
- **License:** [License to be specified]
- **Finetuned from model:** BioBERT base model

### Model Sources

- **Repository:** [GitHub/Model Repository Link - to be added]
- **Paper:** [Research Paper Link - to be added]

## Uses

### Direct Use

The primary use of this model is to automatically classify medical conditions by predicting relevant ICD-9 diagnostic codes from clinical text, such as electronic health records, medical notes, or symptom descriptions.

### Downstream Use

This model can be integrated into:
- Clinical decision support systems
- Medical coding automation
- Electronic health record (EHR) analysis tools
- Healthcare informatics research

### Out-of-Scope Use

- The model should not be used for direct medical diagnosis without professional medical oversight
- It is not intended to replace clinical judgment
- Performance may vary with text outside the medical domain or significantly different from the training corpus

## Bias, Risks, and Limitations

- The model's performance is limited to the medical conditions and coding patterns in the MIMIC-3 dataset
- Potential biases from the original training data may be present
- Accuracy can be affected by variations in medical terminology, writing styles, and complex medical cases

### Recommendations

- Validate model predictions with medical professionals
- Use as a supportive tool, not a replacement for expert medical assessment
- Regularly evaluate performance on new datasets
- Be aware of potential demographic or contextual biases in the predictions

## How to Get Started with the Model

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained('model_path')
tokenizer = AutoTokenizer.from_pretrained('model_path')

# Example prediction function (similar to the provided get_predictions function)
def predict_icd9_codes(input_text, threshold=0.8):
    # Tokenize input
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512, padding='max_length')
    
    # Get model predictions
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.sigmoid(outputs.logits)
        
    # Filter predictions above threshold
    predicted_codes = [model.config.id2label[i] for i in (predictions > threshold).nonzero()[:, 1]]
    
    return predicted_codes
```

## Training Details

### Training Data

- **Dataset:** MIMIC-3 Corpus
- **Domain:** Medical/Clinical text
- **Content:** Electronic Health Records (EHR)

### Training Procedure

#### Preprocessing
- Text tokenization
- Maximum sequence length: 512 tokens
- Padding to uniform length
- Potential text normalization techniques

#### Training Hyperparameters
- **Base Model:** BioBERT
- **Training Regime:** Fine-tuning
- **Precision:** [Specify training precision, e.g., mixed precision]

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data
- Held-out subset of MIMIC-3 corpus
- Diverse medical cases and documentation styles

#### Metrics
- Precision
- Recall
- F1-Score
- Multi-label classification metrics

## Environmental Impact

- Estimated carbon emissions to be calculated
- Compute details to be specified

## Technical Specifications

### Model Architecture
- **Base Model:** BioBERT
- **Task:** Multi-label ICD-9 Code Classification

## Citation

[Citation information to be added when research is published]

## More Information

For more details about the model's development, performance, and usage, please contact the model developers.