microsoft
/

rad-dino

Image Feature Extraction

Transformers

Safetensors

dinov2

Inference Endpoints

Model card Files Files and versions Community

fepegar commited on May 22, 2024

Commit

e8051ea

verified ·

1 Parent(s): f8c75fd

Fix usage snippet

Browse files

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ license: mit
 RAD-DINO is a vision transformer model trained to encode chest X-rays using the self-supervised learning method [DINOv2](https://openreview.net/forum?id=a68SUt6zFt).
-RAD-DINO is described in detail in [RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision (Pérez-García, Sharma, Bond-Taylor et al., 2024)](https://arxiv.org/abs/2401.10815).
 - **Developed by:** Microsoft Health Futures
 - **Model type:** Vision transformer
@@ -46,7 +46,7 @@ Fine-tuning RAD-DINO is typically not necessary to obtain good performance in do
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-## Bias, risks, and limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
@@ -74,15 +74,17 @@ Underlying biases of the training datasets may not be well characterised.
 ...
 >>> # Download the model
 >>> repo = "microsoft/rad-dino"
->>> model = AutoModel.from_pretrained(repo).cuda()
 >>> # The processor takes a PIL image, performs resizing, center-cropping, and
 >>> # intensity normalization using stats from MIMIC-CXR, and returns a
 >>> # dictionary with a PyTorch tensor ready for the encoder
->>> processor = AutoProcessor.from_pretrained(repo)
 >>>
 >>> # Download and preprocess a chest X-ray
 >>> image = download_sample_image()
 >>> inputs = processor(images=image, return_tensors="pt")
 >>>
 >>> # Encode the image!
@@ -90,8 +92,8 @@ Underlying biases of the training datasets may not be well characterised.
 >>>     outputs = model(**inputs)
 >>>
 >>> # Look at the CLS embeddings
->>> cls_embeddings = outputs.pooler_output.shape
->>> cls_embeddings  # (batch_size, num_channels)
 torch.Size([1, 768])
 >>>
 >>> # Look at the patch embeddings (needs `pip install einops`)
@@ -220,4 +222,4 @@ We used [SimpleITK](https://simpleitk.org/) and [Pydicom](https://pydicom.github
 ## Model card contact
-Fernando Pérez-García ([`[email protected]`](mailto:[email protected])).

 RAD-DINO is a vision transformer model trained to encode chest X-rays using the self-supervised learning method [DINOv2](https://openreview.net/forum?id=a68SUt6zFt).
+RAD-DINO is described in detail in [RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision (F. Pérez-García, H. Sharma, S. Bond-Taylor, et al., 2024)](https://arxiv.org/abs/2401.10815).
 - **Developed by:** Microsoft Health Futures
 - **Model type:** Vision transformer
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+## Biases, risks, and limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 ...
 >>> # Download the model
 >>> repo = "microsoft/rad-dino"
+>>> model = AutoModel.from_pretrained(repo)
 >>> # The processor takes a PIL image, performs resizing, center-cropping, and
 >>> # intensity normalization using stats from MIMIC-CXR, and returns a
 >>> # dictionary with a PyTorch tensor ready for the encoder
+>>> processor = AutoImageProcessor.from_pretrained(repo)
 >>>
 >>> # Download and preprocess a chest X-ray
 >>> image = download_sample_image()
+>>> image.size  # (width, height)
+(2765, 2505)
 >>> inputs = processor(images=image, return_tensors="pt")
 >>>
 >>> # Encode the image!
 >>>     outputs = model(**inputs)
 >>>
 >>> # Look at the CLS embeddings
+>>> cls_embeddings = outputs.pooler_output
+>>> cls_embeddings.shape  # (batch_size, num_channels)
 torch.Size([1, 768])
 >>>
 >>> # Look at the patch embeddings (needs `pip install einops`)
 ## Model card contact
+Fernando Pérez-García ([`[email protected]`](mailto:[email protected])).