--- license: other license_name: hai-def license_link: https://developers.google.com/health-ai-developer-foundations/terms language: - en tags: - medical - x-ray - chest-x-ray - medical-embeddings extra_gated_heading: Access CXR Foundation on Hugging Face extra_gated_prompt: >- To access CXR Foundation on Hugging Face, you're required to review and agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms). To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license library_name: cxr-foundation --- # CXR Foundation model card **Model documentation**: [CXR Foundation](https://developers.google.com/health-ai-developer-foundations/cxr-foundation) **Resources**: * Model on Google Cloud Model Garden: [CXR Foundation](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation) * Model on Hugging Face: [google/cxr-foundation](https://huggingface.co/google/cxr-foundation) * GitHub repository (supporting code, Colab notebooks, discussions, and issues): [cxr-foundation](https://github.com/google-health/cxr-foundation) * Quick start notebook: [notebooks/quick_start](https://github.com/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb) * Support: See [Contact](https://developers.google.com/health-ai-developer-foundations/cxr-foundation/get-started.md#contact). **Terms of use**: [Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms) **Author**: Google ## Model information This section describes the CXR Foundation model and how to use it. ### Description CXR Foundation is a machine learning model designed to accelerate AI development for chest X-ray image analysis. It is pre-trained on large amounts of chest X-rays, to produce embeddings that capture dense features relevant for analyzing these images. As a result, the embeddings CXR Foundation produces enable the efficient training of AI models with significantly less data and compute than traditional methods. CXR Foundation offers two types of embeddings: * ELIXR v2.0: Produces 32x768 dimensional vectors, capturing detailed image features relevant to X-ray analysis. * ELIXR-contrastive / v2.0 text: Generates 32x128 dimensional vectors and allows for projecting chest X-ray images and textual prompts into a shared embedding space. This enables powerful applications like semantic image retrieval and zero-shot classification. You can read more about the research behind CXR Foundation in our manuscript: [ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders](https://arxiv.org/abs/2308.01317). ### How to use For getting started quickly with Hugging Face, refer to the Quick start notebook in the next section. If you want to use the model at scale, we recommend that you create a production version using [Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation). ### Examples See the following Colab notebooks for examples of how to use CXR Foundation: * To give the model a quick try, running it locally with weights from Hugging Face, see [Quick start notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb). * For an example of how to use the model to train a linear classifier see [Linear classifier notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/train_data_efficient_classifier.ipynb). * For an example of how to retrieve images from a database using text-image similarity see [Text retrieval notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/retrieve_images_by_text.ipynb). * For an example of how to use the text embeddings to perform zero-shot inference see [Zero-shot inference notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/classify_images_with_natural_language.ipynb). ### Model architecture overview The model uses the [EfficientNet-L2 architecture](https://arxiv.org/pdf/1911.04252v4.pdf) and [BERT architecture](https://arxiv.org/abs/1810.04805). It was trained on 821,544 CXRs from India and the US using abnormal vs. normal labels, i.e. the image contained any kind of abnormality, and the [Supervised Contrastive loss](https://arxiv.org/abs/2004.11362v1) as well as accompanying radiology reports and the [CLIP loss](https://arxiv.org/pdf/2103.00020.pdf) and [BLIP-2 losses](https://arxiv.org/abs/2301.12597). The abnormal vs. normal labels were obtained from more granular labels (e.g. pneumothorax, fracture) as well as [regular expressions on radiology reports](https://pubmed.ncbi.nlm.nih.gov/34471144/). You can read more about the research behind CXR Foundation in our recent publication: [Simplified Transfer Learning for Chest Radiography Models Using Less Data.](https://pubs.rsna.org/doi/10.1148/radiol.212482) ### Technical specifications * Model type: Convolutional neural network that produces embeddings * Key publications: * [Simplified Transfer Learning for Chest Radiography Models Using Less Data](https://pubs.rsna.org/doi/10.1148/radiol.212482) * [ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders](https://arxiv.org/abs/2308.01317) * Model created: August 2, 2024 * Model version: Version: 2.0.0 ### Performance and validation CXR Foundation was evaluated across a range of different tasks for data-efficient classification, zero-shot classification, semantic image retrieval, visual-question answering and report quality assurance. ### Key performance metrics * Data-efficient Classification: **Mean AUCs of 0.898** (across atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) on CheXPert test * Zero-shot classification: **Mean AUC of 0.846 across 13 findings** on CheXpert test. Findings included: atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema, enlarged cardiomediastinum, pleural other, pneumothorax, support devices, airspace opacity, lung lesion, pneumonia, and fracture. * Semantic image retrieval: **0.76 normalized discounted cumulative gain (NDCG) @5** across 19 queries for semantic image retrieval, including perfect retrieval on 12 of them. * Reference: [ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders](https://arxiv.org/pdf/2308.01317) ### Inputs and outputs * **Input**: Serialized `tf.Example` (with the bytes of a `PNG` written in the image/encoded feature key). * **Output**: Embedding (a vector of floating points representing a projection of the original image into a compressed feature space) ## Dataset details ### Training dataset CXR Foundation was trained using the following de-identified datasets: * MIMIC-CXR, comprising of 243,324 images of 60,523 unique patients (cited below); * A private US dataset from an AMC in Illinois comprising of 165,182 images of 12,988 unique patients; and * A private Indian dataset from five hospitals comprising of 485,082 patients of 348,335 unique patients ### Labeling Supervised learning was used to label abnormal and normal human data from radiology reports. A medically tuned LLM, Med-Palm 2 29, was then applied to ensure that the labels were consistent with the report, and a board certified thoracic radiologist (CL) adjudicated cases where the LLM results differed from the ground truth in MIMIC-CXR. *Additional information about data and labels used to evaluate CXR Foundation for downstream tasks can be found in the following references:* - [Sellergren A, Chen C, et al. Simplified Transfer Learning for Chest Radiography Models Using Less Data. Radiology. 2022.](https://pubs.rsna.org/doi/full/10.1148/radiol.212482) - [https://pubs.rsna.org/doi/10.1148/radiol.212482](https://pubs.rsna.org/doi/10.1148/radiol.212482) (Table 1, 2, 3) - [https://github.com/google-research/google-research/tree/master/supcon](https://github.com/google-research/google-research/tree/master/supcon) ## License The use of CXR Foundation is governed by the [Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms). ## Data citation - [MIMIC-CXR Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2024). MIMIC-CXR Database (version 2.1.0). PhysioNet.](https://doi.org/10.13026/4jqj-jw95) - [Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019).](https://doi.org/10.1038/s41597-019-0322-0) - Available on Physionet Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). [PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.](https://pubmed.ncbi.nlm.nih.gov/10851218/) ## Implementation information Details about the model internals. ### Software Training was done using [JAX](https://github.com/jax-ml/jax) JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ## Use and limitations ### Intended use * CXR Foundation can reduce the training data, compute, and technical expertise necessary to develop AI applications for radiographs. The model has been optimized for chest X-rays, but researchers have reported success using it for other types of X-rays, including X-rays of other body parts and even veterinary X-rays. Some example applications include: #### Data-efficient classification: With a low amount of labeled data, you can train a classifier model on top of CXR Foundation embeddings (ELIXR v2.0). Furthermore, each embedding can be used downstream as an input for a variety of different classifiers, with very little additional compute. Below are some example classification tasks: * Clinical findings like fracture or pneumothorax * Determining X-ray image quality * Determining the X-ray view or body part * Determining the presence of devices * Discovering misplaced tubes #### Zero-shot classification By using the contrastive mode (ELIXR-contrastive / v2.0 text), users can get a classification score without any additional training data through textual prompts. Zero-shot works by measuring the relative distance of the image embeddings from a positive e.g., "pleural effusion present", and negative text prompt e.g., "normal X-ray". The use cases are the same as data-efficient classification but don't require data to train. The zero-shot method will outperform data-efficient classifications at low levels of training data, while the data-efficient classification will tend to exceed zero-shot performance with larger amounts of data. See [ELIXR paper](https://arxiv.org/pdf/2308.01317) for more details. #### Semantic image retrieval By using the contrastive mode (ELIXR-contrastive / v2.0 text) users can rank a set of X-rays across a search query. Similar to Zero-shot classification, language-based image retrieval relies on the distance between the embeddings of the set of images and the text embeddings from the search query. ### Benefits * CXR Foundation Embeddings can be used for efficient training of AI development for chest X-ray image analysis with significantly less data and compute than traditional methods. * By leveraging the large set of pre-trained images CXR Foundation is trained on, users need less data but can also build more generalizable models than training on more limited datasets. ### Limitations The following are known factors that might limit the generalizability or usefulness of the model output for application in downstream tasks: * The model was trained using only de-identified data from the US and India and may not generalize well to data from other countries, patient populations, or manufacturers not used in training. * The model has only been validated for a limited number of the many potential downstream tasks involving chest radiographs. * Image quality and min resolution. 1024x1024 recommended. * The model is only used to generate embeddings of user-provided data. It does not generate any predictions or diagnosis on its own. * Task-specific validation remains an important aspect of downstream model development by the end user. * As with any research, developers should ensure that any downstream application is validated to understand performance using data that is appropriately representative of the intended use setting for the specific application (e.g., age, sex, gender, condition, scanner, etc.).