huggingface compatible changes

Browse files

Files changed (6) hide show

MODELCARD.md +128 -0
models/phenom_beta_huggingface/config.json → config.json +0 -0
pyproject.toml +35 -0
requirements.in +0 -17
requirements.txt +0 -326
test_huggingface_mae.py +2 -2

MODELCARD.md ADDED Viewed

	@@ -0,0 +1,128 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Phenom CA-MAE-S/16
+Channel-agnostic image encoding model designed for microscopy image featurization.
+The model uses a vision transformer backbone with channelwise cross-attention over patch tokens to create contextualized representations separately for each channel.
+## Model Details
+### Model Description
+This model is a [channel-agnostic masked autoencoder](https://openaccess.thecvf.com/content/CVPR2024/html/Kraus_Masked_Autoencoders_for_Microscopy_are_Scalable_Learners_of_Cellular_Biology_CVPR_2024_paper.html) trained to reconstruct microscopy images over three datasets:
+1. RxRx3
+2. JUMP-CP overexpression
+3. JUMP-CP gene-knockouts
+- **Developed, funded, and shared by:** Recursion
+- **Model type:** Vision transformer CA-MAE
+- **Image modality:** Optimized for microscopy images from the CellPainting assay
+- **License:**
+### Model Sources
+- **Repository:** [https://github.com/recursionpharma/maes_microscopy](https://github.com/recursionpharma/maes_microscopy)
+- **Paper:** [Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology](https://openaccess.thecvf.com/content/CVPR2024/html/Kraus_Masked_Autoencoders_for_Microscopy_are_Scalable_Learners_of_Cellular_Biology_CVPR_2024_paper.html)
+## Uses
+NOTE: model embeddings tend to extract features only after using standard batch correction post-processing techniques. **We recommend**, at a *minimum*, after inferencing the model over your images, to do the standard `PCA-CenterScale` pattern or better yet Typical Variation Normalization:
+1. Fit a PCA kernel on all the *control images* (or all images if no controls) from across all experimental batches (e.g. the plates of wells from your assay),
+2. Transform all the embeddings with that PCA kernel,
+3. For each experimental batch, fit a separate StandardScaler on the transformed embeddings of the controls from step 2, then transform the rest of the embeddings from that batch with that StandardScaler.
+### Direct Use
+- Create biologically useful embeddings of microscopy images
+- Create contextualized embeddings of each channel of a microscopy image (set `return_channelwise_embeddings=True`)
+- Leverage the full MAE encoder + decoder to predict new channels / stains for images without all 6 CellPainting channels
+### Downstream Use
+- A determined ML expert could fine-tune the encoder for downstream tasks such as classification
+### Out-of-Scope Use
+- Unlikely to be especially performant on brightfield microscopy images
+- Out-of-domain medical images, such as H&E (maybe it would be a decent baseline though)
+## Bias, Risks, and Limitations
+- Primary limitation is that the embeddings tend to be more useful at scale. For example, if you only have 1 plate of microscopy images, the embeddings might underperform compared to a supervised bespoke model.
+## How to Get Started with the Model
+You should be able to successfully run the below tests, which demonstrate how to use the model at inference time.
+```python
+import pytest
+import torch
+from huggingface_mae import MAEModel
+huggingface_phenombeta_model_dir = "."
+# huggingface_modelpath = "recursionpharma/test-pb-model"
+@pytest.fixture
+def huggingface_model():
+    # Make sure you have the model/config downloaded from https://huggingface.co/recursionpharma/test-pb-model to this directory
+    # huggingface-cli download recursionpharma/test-pb-model --local-dir=.
+    huggingface_model = MAEModel.from_pretrained(huggingface_phenombeta_model_dir)
+    huggingface_model.eval()
+    return huggingface_model
+@pytest.mark.parametrize("C", [1, 4, 6, 11])
+@pytest.mark.parametrize("return_channelwise_embeddings", [True, False])
+def test_model_predict(huggingface_model, C, return_channelwise_embeddings):
+    example_input_array = torch.randint(
+        low=0,
+        high=255,
+        size=(2, C, 256, 256),
+        dtype=torch.uint8,
+        device=huggingface_model.device,
+    )
+    huggingface_model.return_channelwise_embeddings = return_channelwise_embeddings
+    embeddings = huggingface_model.predict(example_input_array)
+    expected_output_dim = 384 * C if return_channelwise_embeddings else 384
+    assert embeddings.shape == (2, expected_output_dim)
+```
+## Training, evaluation and testing details
+See paper linked above for details on model training and evaluation. Primary hyperparameters are included in the repo linked above.
+## Environmental Impact
+- **Hardware Type:** Nvidia H100 Hopper nodes
+- **Hours used:** 400
+- **Cloud Provider:** private cloud
+- **Carbon Emitted:** 138.24 kg co2 (roughly the equivalent of one car driving from Toronto to Montreal)
+**BibTeX:**
+```TeX
+@inproceedings{kraus2024masked,
+  title={Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology},
+  author={Kraus, Oren and Kenyon-Dean, Kian and Saberian, Saber and Fallah, Maryam and McLean, Peter and Leung, Jess and Sharma, Vasudev and Khan, Ayla and Balakrishnan, Jia and Celik, Safiye and others},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={11757--11768},
+  year={2024}
+}
+```
+## Model Card Contact
+- Kian Kenyon-Dean: [email protected]
+- Oren Kraus: [email protected]
+- Or, email: [email protected]

models/phenom_beta_huggingface/config.json → config.json RENAMED Viewed

File without changes

pyproject.toml ADDED Viewed

	@@ -0,0 +1,35 @@

+[build-system]
+requires = ["setuptools >= 61.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "maes_microscopy_project"
+version = "0.1.0"
+authors = [
+    {name = "kian-kd", email = "[email protected]"},
+    {name = "Laksh47", email = "[email protected]"},
+]
+requires-python = ">=3.10.4"
+dependencies = [
+    "huggingface-hub",
+    "timm",
+    "torch>=2.3",
+    "torchmetrics",
+    "torchvision",
+    "tqdm",
+    "transformers",
+    "xformers",
+    "zarr",
+    "hydra-core",
+    "pytorch-lightning>=2.1",
+    "matplotlib",
+    "scikit-image",
+    "ipykernel",
+    "isort",
+    "ruff",
+    "pytest",
+]
+[tool.setuptools]
+py-modules = []

requirements.in DELETED Viewed

@@ -1,17 +0,0 @@
-huggingface-hub
-timm
-torch>=2.3
-torchmetrics
-torchvision
-tqdm
-transformers
-xformers
-zarr
-hydra-core
-pytorch-lightning>=2.1
-matplotlib
-scikit-image
-ipykernel
-isort
-ruff
-pytest

requirements.txt DELETED Viewed

@@ -1,326 +0,0 @@
-#
-# This file is autogenerated by pip-compile with Python 3.10
-# by the following command:
-#
-#    pip-compile --no-emit-index-url --output-file=requirements.txt requirements.in
-#
---trusted-host pypi.ngc.nvidia.com
-aiohappyeyeballs==2.4.3
-    # via aiohttp
-aiohttp==3.10.10
-    # via fsspec
-aiosignal==1.3.1
-    # via aiohttp
-antlr4-python3-runtime==4.9.3
-    # via
-    #   hydra-core
-    #   omegaconf
-asciitree==0.3.3
-    # via zarr
-asttokens==2.4.1
-    # via stack-data
-async-timeout==4.0.3
-    # via aiohttp
-attrs==24.2.0
-    # via aiohttp
-certifi==2024.8.30
-    # via requests
-charset-normalizer==3.4.0
-    # via requests
-comm==0.2.2
-    # via ipykernel
-contourpy==1.3.0
-    # via matplotlib
-cycler==0.12.1
-    # via matplotlib
-debugpy==1.8.7
-    # via ipykernel
-decorator==5.1.1
-    # via ipython
-exceptiongroup==1.2.2
-    # via
-    #   ipython
-    #   pytest
-executing==2.1.0
-    # via stack-data
-fasteners==0.19
-    # via zarr
-filelock==3.16.1
-    # via
-    #   huggingface-hub
-    #   torch
-    #   transformers
-    #   triton
-fonttools==4.54.1
-    # via matplotlib
-frozenlist==1.5.0
-    # via
-    #   aiohttp
-    #   aiosignal
-fsspec[http]==2024.10.0
-    # via
-    #   huggingface-hub
-    #   pytorch-lightning
-    #   torch
-huggingface-hub==0.26.2
-    # via
-    #   -r requirements.in
-    #   timm
-    #   tokenizers
-    #   transformers
-hydra-core==1.3.2
-    # via -r requirements.in
-idna==3.10
-    # via
-    #   requests
-    #   yarl
-imageio==2.36.0
-    # via scikit-image
-iniconfig==2.0.0
-    # via pytest
-ipykernel==6.29.5
-    # via -r requirements.in
-ipython==8.29.0
-    # via ipykernel
-isort==5.13.2
-    # via -r requirements.in
-jedi==0.19.1
-    # via ipython
-jinja2==3.1.4
-    # via torch
-jupyter-client==8.6.3
-    # via ipykernel
-jupyter-core==5.7.2
-    # via
-    #   ipykernel
-    #   jupyter-client
-kiwisolver==1.4.7
-    # via matplotlib
-lazy-loader==0.4
-    # via scikit-image
-lightning-utilities==0.11.8
-    # via
-    #   pytorch-lightning
-    #   torchmetrics
-markupsafe==3.0.2
-    # via jinja2
-matplotlib==3.9.2
-    # via -r requirements.in
-matplotlib-inline==0.1.7
-    # via
-    #   ipykernel
-    #   ipython
-mpmath==1.3.0
-    # via sympy
-multidict==6.1.0
-    # via
-    #   aiohttp
-    #   yarl
-nest-asyncio==1.6.0
-    # via ipykernel
-networkx==3.2.1
-    # via
-    #   scikit-image
-    #   torch
-numcodecs==0.12.1
-    # via zarr
-numpy==1.26.4
-    # via
-    #   contourpy
-    #   imageio
-    #   matplotlib
-    #   numcodecs
-    #   scikit-image
-    #   scipy
-    #   tifffile
-    #   torchmetrics
-    #   torchvision
-    #   transformers
-    #   xformers
-    #   zarr
-nvidia-cublas-cu12==12.4.5.8
-    # via
-    #   nvidia-cudnn-cu12
-    #   nvidia-cusolver-cu12
-    #   torch
-nvidia-cuda-cupti-cu12==12.4.127
-    # via torch
-nvidia-cuda-nvrtc-cu12==12.4.127
-    # via torch
-nvidia-cuda-runtime-cu12==12.4.127
-    # via torch
-nvidia-cudnn-cu12==9.1.0.70
-    # via torch
-nvidia-cufft-cu12==11.2.1.3
-    # via torch
-nvidia-curand-cu12==10.3.5.147
-    # via torch
-nvidia-cusolver-cu12==11.6.1.9
-    # via torch
-nvidia-cusparse-cu12==12.3.1.170
-    # via
-    #   nvidia-cusolver-cu12
-    #   torch
-nvidia-nccl-cu12==2.21.5
-    # via torch
-nvidia-nvjitlink-cu12==12.4.127
-    # via
-    #   nvidia-cusolver-cu12
-    #   nvidia-cusparse-cu12
-    #   torch
-nvidia-nvtx-cu12==12.4.127
-    # via torch
-omegaconf==2.3.0
-    # via hydra-core
-packaging==24.1
-    # via
-    #   huggingface-hub
-    #   hydra-core
-    #   ipykernel
-    #   lazy-loader
-    #   lightning-utilities
-    #   matplotlib
-    #   pytest
-    #   pytorch-lightning
-    #   scikit-image
-    #   torchmetrics
-    #   transformers
-parso==0.8.4
-    # via jedi
-pexpect==4.9.0
-    # via ipython
-pillow==11.0.0
-    # via
-    #   imageio
-    #   matplotlib
-    #   scikit-image
-    #   torchvision
-platformdirs==4.3.6
-    # via jupyter-core
-pluggy==1.5.0
-    # via pytest
-prompt-toolkit==3.0.48
-    # via ipython
-propcache==0.2.0
-    # via yarl
-psutil==6.1.0
-    # via ipykernel
-ptyprocess==0.7.0
-    # via pexpect
-pure-eval==0.2.3
-    # via stack-data
-pygments==2.18.0
-    # via ipython
-pyparsing==3.2.0
-    # via matplotlib
-pytest==8.3.3
-    # via -r requirements.in
-python-dateutil==2.9.0.post0
-    # via
-    #   jupyter-client
-    #   matplotlib
-pytorch-lightning==2.4.0
-    # via -r requirements.in
-pyyaml==6.0.2
-    # via
-    #   huggingface-hub
-    #   omegaconf
-    #   pytorch-lightning
-    #   timm
-    #   transformers
-pyzmq==26.2.0
-    # via
-    #   ipykernel
-    #   jupyter-client
-regex==2024.9.11
-    # via transformers
-requests==2.32.3
-    # via
-    #   huggingface-hub
-    #   transformers
-ruff==0.7.2
-    # via -r requirements.in
-safetensors==0.4.5
-    # via
-    #   timm
-    #   transformers
-scikit-image==0.24.0
-    # via -r requirements.in
-scipy==1.13.1
-    # via scikit-image
-six==1.16.0
-    # via
-    #   asttokens
-    #   python-dateutil
-stack-data==0.6.3
-    # via ipython
-sympy==1.13.1
-    # via torch
-tifffile==2024.8.30
-    # via scikit-image
-timm==1.0.11
-    # via -r requirements.in
-tokenizers==0.20.2
-    # via transformers
-tomli==2.0.2
-    # via pytest
-torch==2.5.1
-    # via
-    #   -r requirements.in
-    #   pytorch-lightning
-    #   timm
-    #   torchmetrics
-    #   torchvision
-    #   xformers
-torchmetrics==1.5.1
-    # via
-    #   -r requirements.in
-    #   pytorch-lightning
-torchvision==0.20.1
-    # via
-    #   -r requirements.in
-    #   timm
-tornado==6.4.1
-    # via
-    #   ipykernel
-    #   jupyter-client
-tqdm==4.66.6
-    # via
-    #   -r requirements.in
-    #   huggingface-hub
-    #   pytorch-lightning
-    #   transformers
-traitlets==5.14.3
-    # via
-    #   comm
-    #   ipykernel
-    #   ipython
-    #   jupyter-client
-    #   jupyter-core
-    #   matplotlib-inline
-transformers==4.46.1
-    # via -r requirements.in
-triton==3.1.0
-    # via torch
-typing-extensions==4.12.2
-    # via
-    #   huggingface-hub
-    #   ipython
-    #   lightning-utilities
-    #   multidict
-    #   pytorch-lightning
-    #   torch
-urllib3==2.2.3
-    # via requests
-wcwidth==0.2.13
-    # via prompt-toolkit
-xformers==0.0.28.post3
-    # via -r requirements.in
-yarl==1.17.1
-    # via aiohttp
-zarr==2.18.2
-    # via -r requirements.in
-# The following packages are considered to be unsafe in a requirements file:
-# setuptools

test_huggingface_mae.py CHANGED Viewed

@@ -3,14 +3,14 @@ import torch
 from huggingface_mae import MAEModel
-huggingface_phenombeta_model_dir = "models/phenom_beta_huggingface"
 # huggingface_modelpath = "recursionpharma/test-pb-model"
 @pytest.fixture
 def huggingface_model():
     # Make sure you have the model/config downloaded from https://huggingface.co/recursionpharma/test-pb-model to this directory
-    # huggingface-cli download recursionpharma/test-pb-model --local-dir=models/phenom_beta_huggingface
     huggingface_model = MAEModel.from_pretrained(huggingface_phenombeta_model_dir)
     huggingface_model.eval()
     return huggingface_model

 from huggingface_mae import MAEModel
+huggingface_phenombeta_model_dir = "."
 # huggingface_modelpath = "recursionpharma/test-pb-model"
 @pytest.fixture
 def huggingface_model():
     # Make sure you have the model/config downloaded from https://huggingface.co/recursionpharma/test-pb-model to this directory
+    # huggingface-cli download recursionpharma/test-pb-model --local-dir=.
     huggingface_model = MAEModel.from_pretrained(huggingface_phenombeta_model_dir)
     huggingface_model.eval()
     return huggingface_model