Spaces:

AI-BIO
/

ProteinGPT-Llama3

Sleeping

App Files Files Community

EdwardoSunny commited on Sep 16, 2024

Commit

85ab89d

1 Parent(s): 470480a

finished

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

LICENSE.md +14 -0
LICENSE_Lavis.md +14 -0
README.md +110 -13
app.py +125 -62
assets/distribution.png +0 -0
configs/evaluation.yaml +25 -0
configs/minigpt4.yaml +35 -0
configs/train_instruction_tuning.yaml +54 -0
configs/train_modality_alignment.yaml +60 -0
dataset.json +0 -0
dataset/README.md +15 -0
demo.sh +5 -0
demo_esm.py +101 -0
deprecate/inference.py +129 -0
environment.yml +63 -0
esm/__init__.py +12 -0
esm/__pycache__/__init__.cpython-310.pyc +0 -0
esm/__pycache__/axial_attention.cpython-310.pyc +0 -0
esm/__pycache__/constants.cpython-310.pyc +0 -0
esm/__pycache__/data.cpython-310.pyc +0 -0
esm/__pycache__/modules.cpython-310.pyc +0 -0
esm/__pycache__/multihead_attention.cpython-310.pyc +0 -0
esm/__pycache__/pretrained.cpython-310.pyc +0 -0
esm/__pycache__/rotary_embedding.cpython-310.pyc +0 -0
esm/__pycache__/version.cpython-310.pyc +0 -0
esm/axial_attention.py +239 -0
esm/constants.py +10 -0
esm/data.py +493 -0
esm/esmfold/v1/__init__.py +0 -0
esm/esmfold/v1/categorical_mixture.py +43 -0
esm/esmfold/v1/esmfold.py +364 -0
esm/esmfold/v1/misc.py +309 -0
esm/esmfold/v1/pretrained.py +181 -0
esm/esmfold/v1/tri_self_attn_block.py +160 -0
esm/esmfold/v1/trunk.py +243 -0
esm/inverse_folding/__init__.py +11 -0
esm/inverse_folding/features.py +356 -0
esm/inverse_folding/gvp_encoder.py +56 -0
esm/inverse_folding/gvp_modules.py +475 -0
esm/inverse_folding/gvp_transformer.py +144 -0
esm/inverse_folding/gvp_transformer_encoder.py +189 -0
esm/inverse_folding/gvp_utils.py +68 -0
esm/inverse_folding/multichain_util.py +152 -0
esm/inverse_folding/transformer_decoder.py +228 -0
esm/inverse_folding/transformer_layer.py +304 -0
esm/inverse_folding/util.py +323 -0
esm/model/__init__.py +1 -0
esm/model/__pycache__/__init__.cpython-310.pyc +0 -0
esm/model/__pycache__/esm1.cpython-310.pyc +0 -0
esm/model/__pycache__/esm2.cpython-310.pyc +0 -0

LICENSE.md ADDED Viewed

	@@ -0,0 +1,14 @@

+BSD 3-Clause License
+Copyright 2023 Deyao Zhu
+All rights reserved.
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

LICENSE_Lavis.md ADDED Viewed

	@@ -0,0 +1,14 @@

+BSD 3-Clause License
+Copyright (c) 2022 Salesforce, Inc.
+All rights reserved.
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md CHANGED Viewed

@@ -1,13 +1,110 @@
----
-title: ProteinGPT Llama3
-emoji: 💬
-colorFrom: yellow
-colorTo: purple
-sdk: gradio
-sdk_version: 4.36.1
-app_file: app.py
-pinned: false
-license: mit
----
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

+# ProteinChat: Towards Enabling ChatGPT-Like Capabilities on Protein 3D Structures
+This repository holds the code and data of ProteinChat: Towards Enabling ChatGPT-Like Capabilities on Protein 3D Structures.
+## Technical report is available [here](https://www.techrxiv.org/articles/preprint/ProteinChat_Towards_Achieving_ChatGPT-Like_Functionalities_on_Protein_3D_Structures/23120606)
+## Examples
+![Eg1](fig/protein-eg.png)
+## Introduction
+- In this work, we make an initial attempt towards enabling ChatGPT-like capabilities on protein 3D structures, by developing a prototype system ProteinChat.
+- ProteinChat works in a similar way as ChatGPT. Users upload a protein 3D structure and ask various questions about this protein. ProteinChat will answer these questions in a multi-turn, interactive manner.
+- The ProteinChat system consists of a protein 3D structure encoder (based on [ESM inverse folding](https://github.com/facebookresearch/esm/tree/main/examples/inverse_folding)), a large language model (LLM), and an adaptor. The protein encoder  takes a protein 3D structure as input and learns a representation for this protein. The adaptor transforms the protein representation produced by the protein encoder  into another  representation that is acceptable to the  LLM. The LLM takes the representation transformed by the adaptor and users' questions about this protein as inputs and generates answers. All these components are trained end-to-end.
+- To train ProteinChat, we collected   instruction tuning datasets which contain 143508 proteins and 143508 instructions.
+![overview](fig/proteinchat_overview.png)
+## Datasets
+The dataset contains 143508 proteins (represented using 3D structures) with 143508 instructions.
+The instruction set are available at [this link](https://drive.google.com/file/d/1iMgPyiIzpvXdKiNsXnRKn2YpmP92Xyub/view?usp=share_link).
+The processed protein files (83G in total) are available at [this link](https://drive.google.com/file/d/1AeJW5BY5C-d8mKJjAULTax6WA4hzWS0N/view?usp=share_link).
+The data is curated from the [Protein Data Bank](https://www.rcsb.org/). More details can be found [here](data/README.md).
+## Getting Started
+### Installation
+These instructions largely follow those in MiniGPT-4.
+**1. Prepare the code and the environment**
+Git clone our repository, creating a python environment and ativate it via the following command
+```bash
+git clone https://github.com/UCSD-AI4H/proteinchat
+cd proteinchat
+conda env create -f environment.yml
+conda activate proteinchat
+pip install einops
+```
+Verify the installation of `torch` and `torchvision` is successful by running `python -c "import torchvision; print(torchvision.__version__)"`. If it outputs the version number without any warnings or errors, then you are good to go. __If it outputs any warnings or errors__, try to uninstall `torch` by `conda uninstall pytorch torchvision torchaudio cudatoolkit` and then reinstall them following [here](https://pytorch.org/get-started/previous-versions/#v1121). You need to find the correct command according to the CUDA version your GPU driver supports (check `nvidia-smi`).
+**2. Prepare the pretrained Vicuna weights**
+The current version of ProteinChat is built on the v0 versoin of Vicuna-13B.
+Please refer to our instruction [here](PrepareVicuna.md)
+to prepare the Vicuna weights.
+The final weights would be in a single folder in a structure similar to the following:
+```
+vicuna_weights
+├── config.json
+├── generation_config.json
+├── pytorch_model.bin.index.json
+├── pytorch_model-00001-of-00003.bin
+...
+```
+Then, set the path to the vicuna weight in the model config file
+[here](minigpt4/configs/models/minigpt4.yaml#L16) at Line 16.
+### Training
+**You need roughly 45 GB GPU memory for the training.**
+The training configuration file is [configs/train_instruction_tuning.yaml](configs/train_instruction_tuning.yaml). In addition, you may want to change the number of epochs and other hyper-parameters there, such as `max_epoch`, `init_lr`, `min_lr`,`warmup_steps`, `batch_size_train`. Please adjust `iters_per_epoch` so that `iters_per_epoch` * `batch_size_train` = your training set size. Due to the GPU consumption, we set `batch_size_train=1`.
+Start training on LLaMA model with protein dataset by running [finetune.sh](finetune.sh) `bash finetune.sh`.
+**It takes around 24 GB GPU memory for the demo.**
+Find the checkpoint you save in the training process above, which is located under the folder `minigpt4/output/minigpt4_stage2_esm/` by default. Copy it to the folder `ckpt` by running `cp minigpt4/output/minigpt4_stage2_esm/.../checkpoint_xxx.pth`, and modify the `ckpt` entry in [configs/evaluation.yaml](configs/evaluation.yaml) to the location of your checkpoint.
+Now we launch the `demo.py` in our original environment. Then, start the demo [demo.sh](demo.sh) on your local machine by running `bash demo.sh`. Then, open the URL created by the demo and try it out!
+## Acknowledgement
++ [ProteinChat](https://github.com/UCSD-AI4H/proteinchat)
++ [MiniGPT-4](https://minigpt-4.github.io/)
++ [Lavis](https://github.com/salesforce/LAVIS)
++ [Vicuna](https://github.com/lm-sys/FastChat)
++ [ESM-IF1](https://github.com/facebookresearch/esm/tree/main/examples/inverse_folding)
+## License
+This repository is under [BSD 3-Clause License](LICENSE.md).
+Many codes are based on [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4) with BSD 3-Clause License [here](LICENSE_MiniGPT4.md), which is based on [Lavis](https://github.com/salesforce/LAVIS) with
+BSD 3-Clause License [here](LICENSE_Lavis.md).
+## Disclaimer
+This is a prototype system that has not been systematically and comprehensively validated by biologists yet. Please use with caution.
+Trained models and demo websites will be released after we thoroughly validate the system with biologists.
+## Citation
+If you're using ProteinChat in your research or applications, please cite using this BibTeX:
+```bibtex
+@article{guo2023proteinchat,
+  title={ProteinChat: Towards Enabling ChatGPT-Like Capabilities on Protein 3D Structures},
+  author={Guo, Han and Huo, Mingjia and Xie, Pengtao},
+  year={2023}
+}

app.py CHANGED Viewed

@@ -1,63 +1,126 @@
 import gradio as gr
-from huggingface_hub import InferenceClient
-"""
-For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
-"""
-client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
-def respond(
-    message,
-    history: list[tuple[str, str]],
-    system_message,
-    max_tokens,
-    temperature,
-    top_p,
-):
-    messages = [{"role": "system", "content": system_message}]
-    for val in history:
-        if val[0]:
-            messages.append({"role": "user", "content": val[0]})
-        if val[1]:
-            messages.append({"role": "assistant", "content": val[1]})
-    messages.append({"role": "user", "content": message})
-    response = ""
-    for message in client.chat_completion(
-        messages,
-        max_tokens=max_tokens,
-        stream=True,
-        temperature=temperature,
-        top_p=top_p,
-    ):
-        token = message.choices[0].delta.content
-        response += token
-        yield response
-"""
-For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
-"""
-demo = gr.ChatInterface(
-    respond,
-    additional_inputs=[
-        gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
-        gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
-        gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
-        gr.Slider(
-            minimum=0.1,
-            maximum=1.0,
-            value=0.95,
-            step=0.05,
-            label="Top-p (nucleus sampling)",
-        ),
-    ],
-)
-if __name__ == "__main__":
-    demo.launch(share=True)

 import gradio as gr
+import argparse
+import os
+import random
+import numpy as np
+import torch
+import torch.backends.cudnn as cudnn
+from minigpt4.common.config import Config
+from minigpt4.common.dist_utils import get_rank
+from minigpt4.common.registry import registry
+from minigpt4.conversation.conversation_esm import Chat, CONV_VISION
+import esm
+# ProteinGPT Initialization Function
+def initialize_chat(args):
+    cfg = Config(args)
+    model_config = cfg.model_cfg
+    model_config.device_8bit = 0
+    model_cls = registry.get_model_class(model_config.arch)
+    model = model_cls.from_config(model_config).to('cpu')
+    vis_processor_cfg = cfg.datasets_cfg.cc_sbu_align.vis_processor.train
+    vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg)
+    chat = Chat(model, vis_processor, device='cpu')
+    return chat
+# Gradio Reset Function
+def gradio_reset(chat_state, img_list):
+    if chat_state is not None:
+        chat_state.messages = []
+    if img_list is not None:
+        img_list = []
+    return None, gr.update(value=None, interactive=True), gr.update(placeholder='Please upload your protein structure and sequence first', interactive=False), gr.update(value="Upload & Start Chat", interactive=True), chat_state, img_list
+# Upload Function
+def upload_protein(structure, sequence, text_input, chat_state):
+    # Check if structure and sequence files are valid
+    if structure is None or not structure.endswith(".pt"):
+        return (None, None, None, gr.update(placeholder="Invalid structure file, must be a .pt file.", interactive=True), chat_state, None)
+    if sequence is None or not sequence.endswith(".pt"):
+        return (None, None, None, gr.update(placeholder="Invalid sequence file, must be a .pt file.", interactive=True), chat_state, None)
+    # Load protein structure and sequence
+    pdb_embedding = torch.load(structure, map_location=torch.device('cpu'))
+    sample_pdb = pdb_embedding.to('cpu')
+    seq_embedding = torch.load(sequence, map_location=torch.device('cpu'))
+    sample_seq = seq_embedding.to('cpu')
+    # Initialize the conversation state
+    chat_state = CONV_VISION.copy()
+    img_list = []
+    # Upload protein data
+    llm_message = chat.upload_protein(sample_pdb, sample_seq, chat_state, img_list)
+    # Return the required outputs
+    return (gr.update(interactive=False),  # Disable structure file input
+            gr.update(interactive=False),  # Disable sequence file input
+            gr.update(interactive=True, placeholder='Type and press Enter'),  # Enable the text input box
+            gr.update(value="Start Chatting", interactive=False),  # Update upload button state
+            chat_state,  # Return the conversation state
+            img_list)  # Return the list of images (if any)
+# Ask Function
+def gradio_ask(user_message, chatbot, chat_state):
+    if len(user_message) == 0:
+        return gr.update(interactive=True, placeholder='Input should not be empty!'), chatbot, chat_state
+    chat.ask(user_message, chat_state)
+    chatbot = chatbot + [[user_message, None]]
+    return '', chatbot, chat_state
+# Answer Function
+def gradio_answer(chatbot, chat_state, img_list, num_beams, temperature):
+    img_list = [mat.half() for mat in img_list]
+    llm_message = chat.answer(conv=chat_state, img_list=img_list, max_new_tokens=300, num_beams=num_beams, temperature=temperature, max_length=2000)[0]
+    chatbot[-1][1] = llm_message
+    return chatbot, chat_state, img_list
+# Command-line Argument Parsing
+def parse_args():
+    parser = argparse.ArgumentParser(description="Demo")
+    parser.add_argument("--cfg-path", help="path to configuration file.", default='configs/evaluation.yaml')
+    parser.add_argument(
+        "--options",
+        nargs="+",
+        help="override some settings in the used config, the key-value pair "
+        "in xxx=yyy format will be merged into config file (deprecate), "
+        "change to --cfg-options instead.",
+    )
+    args = parser.parse_args()
+    return args
+# Demo Gradio Interface
+title = """<h1 align="center">Demo of ProteinGPT</h1>"""
+description = """<h3>Upload your protein sequence and structure and start chatting with your protein!</h3>"""
+article = """<div style='display:flex; gap: 0.25rem; '><a href='https://huggingface.co/AI-BIO/ProteinGPT-Llama3'><img src='https://img.shields.io/badge/Project-Page-Green'></a><a href='https://github.com'><img src='https://img.shields.io/badge/Github-Code-blue'></a><a href='https://arxiv.org/abs/2408.11363'><img src='https://img.shields.io/badge/Paper-PDF-red'></a></div>"""
+args = parse_args()  # Parse arguments to get config and model info
+chat = initialize_chat(args)  # Initialize ProteinGPT model
+with gr.Blocks() as demo:
+    gr.Markdown(title)
+    gr.Markdown(description)
+    gr.Markdown(article)
+    with gr.Row():
+        with gr.Column(scale=0.5):
+            structure = gr.File(type="filepath", label="Upload Protein Structure", show_label=True)
+            sequence = gr.File(type="filepath", label="Upload Protein Sequence", show_label=True)
+            upload_button = gr.Button(value="Upload & Start Chat", interactive=True, variant="primary")
+            clear = gr.Button("Restart")
+            num_beams = gr.Slider(minimum=1, maximum=5, value=1, step=1, interactive=True, label="Beam search numbers")
+            temperature = gr.Slider(minimum=0.1, maximum=2.0, value=1.0, step=0.1, interactive=True, label="Temperature")
+        with gr.Column():
+            chat_state = gr.State()
+            img_list = gr.State()
+            chatbot = gr.Chatbot(label='ProteinGPT')
+            text_input = gr.Textbox(label='User', placeholder='Please upload your image first', interactive=False)
+    upload_button.click(upload_protein,
+                    [structure, sequence, text_input, chat_state],
+                    [structure, sequence, text_input, upload_button, chat_state, img_list])
+    text_input.submit(gradio_ask, [text_input, chatbot, chat_state], [text_input, chatbot, chat_state]).then(gradio_answer, [chatbot, chat_state, img_list, num_beams, temperature], [chatbot, chat_state, img_list])
+    clear.click(gradio_reset, [chat_state, img_list], [chatbot, structure, sequence, text_input, upload_button, chat_state, img_list], queue=False)
+demo.launch(share=True)

assets/distribution.png ADDED Viewed

configs/evaluation.yaml ADDED Viewed

	@@ -0,0 +1,25 @@

+model:
+  arch: mini_gpt4
+  model_type: pretrain_vicuna
+  freeze_vit: True
+  freeze_qformer: True
+  max_txt_len: 256
+  end_sym: "###"
+  low_resource: False
+  prompt_template: '###Human: {} ###Assistant: '
+  # ckpt: '/home/ubuntu/proteinchat/minigpt4/ft/Llama-2-7b-chat-hf/20240610191/checkpoint_5.pth'
+  ckpt: 'minigpt4/ft/Meta-Llama-3-8B-Instruct-hf/20240609203/checkpoint_5.pth'
+datasets:
+  cc_sbu_align:
+    vis_processor:
+      train:
+        name: "blip2_image_eval"
+        image_size: 224
+    text_processor:
+      train:
+        name: "blip_caption"
+run:
+  task: image_text_pretrain

configs/minigpt4.yaml ADDED Viewed

	@@ -0,0 +1,35 @@

+model:
+  arch: mini_gpt4
+  # vit encoder
+  image_size: 224
+  drop_path_rate: 0
+  use_grad_checkpoint: False
+  vit_precision: "fp16"
+  freeze_vit: True
+  freeze_qformer: True
+  # Q-Former
+  num_query_token: 32
+  # Vicuna
+  # llama_model: "/home/ubuntu/ckpt/hf/Meta-Llama-3-8B-Instruct-hf/"
+  llama_model: "meta-llama/Meta-Llama-3-8B-Instruct"
+  # generation configs
+  prompt: ""
+preprocess:
+    vis_processor:
+        train:
+          name: "blip2_image_train"
+          image_size: 224
+        eval:
+          name: "blip2_image_eval"
+          image_size: 224
+    text_processor:
+        train:
+          name: "blip_caption"
+        eval:
+          name: "blip_caption"

configs/train_instruction_tuning.yaml ADDED Viewed

	@@ -0,0 +1,54 @@

+model:
+  arch: mini_gpt4
+  model_type: pretrain_vicuna
+  freeze_vit: True
+  freeze_qformer: True
+  # low_resource: True
+  max_txt_len: 256
+  end_sym: "###"
+  prompt_template: '###Human: {} ###Assistant: '
+  # ckpt: '/home/ubuntu/proteinchat/minigpt4/output/Meta-Llama-3-8B-Instruct-hf/20240606190/checkpoint_2.pth'
+  ckpt: '/home/ubuntu/proteinchat/minigpt4/output/Llama-2-7b-chat-hf/20240606005/checkpoint_2.pth'
+datasets:
+  cc_sbu_align:
+    vis_processor:
+      train:
+        name: "blip2_image_train"
+        image_size: 224
+    text_processor:
+      train:
+        name: "blip_caption"
+run:
+  task: image_text_pretrain
+  # optimizer
+  lr_sched: "linear_warmup_cosine_lr"
+  init_lr: 1e-5
+  min_lr: 1e-6
+  warmup_lr: 1e-6
+  weight_decay: 0.05
+  max_epoch: 10
+  # iters_per_epoch: 762
+  batch_size_train: 1
+  batch_size_eval: 1
+  num_workers: 12
+  warmup_steps: 5000
+  seed: 42
+  # output_dir: "ft/Meta-Llama-3-8B-Instruct-hf/"
+  output_dir: "ft/Llama-2-7b-chat-hf/"
+  amp: True
+  resume_ckpt_path: null
+  evaluate: False
+  train_splits: ["train"]
+  device: "cuda"
+  world_size: 1
+  dist_url: "env://"
+  distributed: True
+  stage: 2

configs/train_modality_alignment.yaml ADDED Viewed

	@@ -0,0 +1,60 @@

+model:
+  arch: mini_gpt4
+  model_type: pretrain_vicuna
+  freeze_vit: True
+  freeze_qformer: True
+  # low_resource: True
+  max_txt_len: 384
+datasets:
+  laion:
+    vis_processor:
+      train:
+        name: "blip2_image_train"
+        image_size: 224
+    text_processor:
+      train:
+        name: "blip_caption"
+    sample_ratio: 115
+  cc_sbu:
+    vis_processor:
+        train:
+          name: "blip2_image_train"
+          image_size: 224
+    text_processor:
+        train:
+          name: "blip_caption"
+    sample_ratio: 14
+run:
+  task: image_text_pretrain
+  # optimizer
+  lr_sched: "linear_warmup_cosine_lr"
+  init_lr: 1e-4
+  min_lr: 8e-5
+  warmup_lr: 1e-6
+  weight_decay: 0.05
+  max_epoch: 3
+  batch_size_train: 1
+  batch_size_eval: 1
+  num_workers: 12
+  warmup_steps: 5000
+  seed: 42
+  output_dir: "output/Meta-Llama-3-8B-Instruct-hf/"
+  # output_dir: "output/Llama-2-7b-chat-hf/"
+  amp: True
+  resume_ckpt_path: null
+  evaluate: False
+  train_splits: ["train"]
+  device: "cuda"
+  world_size: 1
+  dist_url: "env://"
+  distributed: True
+  stage: 1

dataset.json ADDED Viewed

The diff for this file is too large to render. See raw diff

dataset/README.md ADDED Viewed

	@@ -0,0 +1,15 @@

+# Datast
+## Modality Alignment
+- ProteinChat: PDB embedding and abstract information.
+## Instruction Tuning
+- GPT-4o.
+## Citation
+@article{guo2023proteinchat,
+  title={ProteinChat: Towards Enabling ChatGPT-Like Capabilities on Protein 3D Structures},
+  author={Guo, Han and Huo, Mingjia and Xie, Pengtao},
+  year={2023}
+}

demo.sh ADDED Viewed

	@@ -0,0 +1,5 @@

+# PDB_ID=7rvu
+# PDB_ID=5x1y
+PDB_ID=6o7q
+python demo_esm.py --cfg-path configs/evaluation.yaml  --gpu-id 0  --pdb /home/ubuntu/pt/$PDB_ID.pt --seq /home/ubuntu/seq/$PDB_ID.pt

demo_esm.py ADDED Viewed

	@@ -0,0 +1,101 @@

+import argparse
+import os
+import random
+import numpy as np
+import torch
+import torch.backends.cudnn as cudnn
+from minigpt4.common.config import Config
+from minigpt4.common.dist_utils import get_rank
+from minigpt4.common.registry import registry
+from minigpt4.conversation.conversation_esm import Chat, CONV_VISION
+# imports modules for registration
+from minigpt4.datasets.builders import *
+from minigpt4.models import *
+from minigpt4.processors import *
+from minigpt4.runners import *
+from minigpt4.tasks import *
+import sys
+import esm
+def parse_args():
+    parser = argparse.ArgumentParser(description="Demo")
+    parser.add_argument("--cfg-path", required=True, help="path to configuration file.")
+    parser.add_argument("--gpu-id", type=int, default=0, help="specify the gpu to load the model.")
+    parser.add_argument("--pdb", help="specifiy where the protein file is (.pt)")
+    parser.add_argument("--seq", help="specifiy where the sequence file is (.pt)")
+    parser.add_argument(
+        "--options",
+        nargs="+",
+        help="override some settings in the used config, the key-value pair "
+        "in xxx=yyy format will be merged into config file (deprecate), "
+        "change to --cfg-options instead.",
+    )
+    args = parser.parse_args()
+    return args
+def setup_seeds(config):
+    seed = config.run_cfg.seed + get_rank()
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    cudnn.benchmark = False
+    cudnn.deterministic = True
+# ========================================
+#             Model Initialization
+# ========================================
+print('Initializing Chat')
+args = parse_args()
+cfg = Config(args)
+model_config = cfg.model_cfg
+model_config.device_8bit = args.gpu_id
+model_cls = registry.get_model_class(model_config.arch)
+model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
+vis_processor_cfg = cfg.datasets_cfg.cc_sbu_align.vis_processor.train
+vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg)
+chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
+print('Initialization Finished')
+chat_state = CONV_VISION.copy()
+img_list = []
+pdb_path = args.pdb
+seq_path = args.seq
+if pdb_path[-3:] == ".pt":
+    pdb_embedding = torch.load(pdb_path, map_location=torch.device('cpu'))
+    sample_pdb = pdb_embedding.to('cuda:{}'.format(args.gpu_id))
+if seq_path[-3:] == ".pt":
+    seq_embedding = torch.load(seq_path, map_location=torch.device('cpu'))
+    sample_seq = seq_embedding.to('cuda:{}'.format(args.gpu_id))
+llm_message = chat.upload_protein(sample_pdb, sample_seq, chat_state, img_list)
+print(llm_message)
+img_list = [mat.half() for mat in img_list]
+while True:
+    user_input = input(">")
+    if (len(user_input) == 0):
+        print("USER INPUT CANNOT BE EMPTY!")
+        continue
+    elif (user_input.lower() == "exit()"):
+        break
+    chat.ask(user_input, chat_state)
+    llm_message = chat.answer(conv=chat_state,
+                            img_list=img_list,
+                            num_beams=1,
+                            temperature=0.7,
+                            max_new_tokens=300,
+                            max_length=2000)[0]
+    print("B: ", llm_message)

deprecate/inference.py ADDED Viewed

	@@ -0,0 +1,129 @@

+import argparse
+import os
+import random
+import time
+import numpy as np
+import torch
+import torch.backends.cudnn as cudnn
+import gradio as gr
+import esm
+from minigpt4.common.config import Config
+from minigpt4.common.dist_utils import get_rank
+from minigpt4.common.registry import registry
+from minigpt4.conversation.conversation_esm import Chat, CONV_VISION
+import json
+# Imports PIL module
+from PIL import Image
+# imports modules for registration
+from minigpt4.datasets.builders import *
+from minigpt4.models import *
+from minigpt4.processors import *
+from minigpt4.runners import *
+from minigpt4.tasks import *
+import esm
+import esm.inverse_folding
+def parse_args():
+    parser = argparse.ArgumentParser(description="Demo")
+    parser.add_argument("--cfg-path", required=True, help="path to configuration file.")
+    parser.add_argument("--gpu-id", type=int, default=0, help="specify the gpu to load the model.")
+    # parser.add_argument("--json-path", default='/home/h5guo/shared/Mini-GPT4/coco_json/cocoval2014_img_prompt.json', help="path to the classification json file")
+    # parser.add_argument("--caption-save-path", default='/home/h5guo/shared/Mini-GPT4/coco_json_result/results.json', help="path to saved generated captions")
+    parser.add_argument(
+        "--options",
+        nargs="+",
+        help="override some settings in the used config, the key-value pair "
+        "in xxx=yyy format will be merged into config file (deprecate), "
+        "change to --cfg-options instead.",
+    )
+    args = parser.parse_args()
+    return args
+def setup_seeds(config):
+    seed = config.run_cfg.seed + get_rank()
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    cudnn.benchmark = False
+    cudnn.deterministic = True
+# ========================================
+#             Model Initialization
+# ========================================
+print('Initializing Chat')
+args = parse_args()
+cfg = Config(args)
+model_config = cfg.model_cfg
+model_config.device_8bit = args.gpu_id
+model_cls = registry.get_model_class(model_config.arch)
+model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
+vis_processor_cfg = cfg.datasets_cfg.cc_sbu_align.vis_processor.train
+vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg)
+chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
+print('Initialization Finished')
+# ========================================
+#             Gradio Setting
+# ========================================
+def gradio_reset(chat_state, img_list):
+    if chat_state is not None:
+        chat_state.messages = []
+    if img_list is not None:
+        img_list = []
+    return chat_state, img_list
+def upload_protein(gr_img):
+    chat_state = CONV_VISION.copy()
+    img_list = []
+    llm_message = chat.upload_protein(gr_img, chat_state, img_list)
+    return chat_state, img_list
+def gradio_ask(user_message, chat_state):
+    chat.ask(user_message, chat_state)
+    return chat_state
+def gradio_answer(chat_state, img_list, num_beams=1, temperature=1e-3):
+    llm_message = chat.answer(conv=chat_state,
+                              img_list=img_list,
+                              num_beams=num_beams,
+                              temperature=temperature,
+                              max_new_tokens=300,
+                              max_length=2000)[0]
+    return llm_message, chat_state, img_list
+if  __name__ == "__main__":
+    start = time.time()
+    print("******************")
+    protein_embedding_path = "/home/h5guo/data/esm_subset/pt/2wge.pt"
+    protein_embedding = torch.load(protein_embedding_path, map_location=torch.device('cpu'))
+    sample_protein = protein_embedding.to('cuda:{}'.format(args.gpu_id))
+    user_message = "Describe this protein in a short paragraph."
+    chat_state, img_list = upload_protein(sample_protein)
+    chat_state = gradio_ask(user_message, chat_state)
+    llm_message, chat_state, img_list = gradio_answer(chat_state, img_list)
+    print(f"llm_message: {llm_message}")
+    end = time.time()
+    print(end - start)
+    # i += 1
+    print("******************")
+    # f.close()

environment.yml ADDED Viewed

	@@ -0,0 +1,63 @@

+name: proteinchat
+channels:
+  - pytorch
+  - defaults
+  - anaconda
+dependencies:
+  - python=3.9
+  - cudatoolkit
+  - pip
+  - pytorch=1.12.1
+  - pytorch-mutex=1.0=cuda
+  - torchaudio=0.12.1
+  - torchvision=0.13.1
+  - pip:
+    - accelerate==0.16.0
+    - aiohttp==3.8.4
+    - aiosignal==1.3.1
+    - async-timeout==4.0.2
+    - attrs==22.2.0
+    - bitsandbytes==0.37.0
+    - cchardet==2.1.7
+    - chardet==5.1.0
+    - contourpy==1.0.7
+    - cycler==0.11.0
+    - filelock==3.9.0
+    - fonttools==4.38.0
+    - frozenlist==1.3.3
+    - huggingface-hub==0.13.4
+    - importlib-resources==5.12.0
+    - kiwisolver==1.4.4
+    - matplotlib==3.7.0
+    - multidict==6.0.4
+    - openai==0.27.0
+    - packaging==23.0
+    - psutil==5.9.4
+    - pycocotools==2.0.6
+    - pyparsing==3.0.9
+    - python-dateutil==2.8.2
+    - pyyaml==6.0
+    - regex==2022.10.31
+    - tokenizers==0.13.2
+    - tqdm==4.64.1
+    - transformers==4.28.0
+    - timm==0.6.13
+    - spacy==3.5.1
+    - webdataset==0.2.48
+    - scikit-learn==1.2.2
+    - scipy==1.10.1
+    - yarl==1.8.2
+    - zipp==3.14.0
+    - omegaconf==2.3.0
+    - opencv-python==4.7.0.72
+    - iopath==0.1.10
+    - decord==0.6.0
+    - tenacity==8.2.2
+    - peft
+    - pycocoevalcap
+    - sentence-transformers
+    - umap-learn
+    - notebook
+    - gradio==3.24.1
+    - gradio-client==0.0.8
+    - wandb

esm/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+from .version import version as __version__  # noqa
+from .data import Alphabet, BatchConverter, FastaBatchedDataset  # noqa
+from .model.esm1 import ProteinBertModel  # noqa
+from .model.esm2 import ESM2  # noqa
+from .model.msa_transformer import MSATransformer  #noqa
+from . import pretrained  # noqa

esm/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (490 Bytes). View file

esm/__pycache__/axial_attention.cpython-310.pyc ADDED Viewed

Binary file (5.43 kB). View file

esm/__pycache__/constants.cpython-310.pyc ADDED Viewed

Binary file (288 Bytes). View file

esm/__pycache__/data.cpython-310.pyc ADDED Viewed

Binary file (15.5 kB). View file

esm/__pycache__/modules.cpython-310.pyc ADDED Viewed

Binary file (13 kB). View file

esm/__pycache__/multihead_attention.cpython-310.pyc ADDED Viewed

Binary file (12 kB). View file

esm/__pycache__/pretrained.cpython-310.pyc ADDED Viewed

Binary file (19.5 kB). View file

esm/__pycache__/rotary_embedding.cpython-310.pyc ADDED Viewed

Binary file (2.73 kB). View file

esm/__pycache__/version.cpython-310.pyc ADDED Viewed

Binary file (186 Bytes). View file

esm/axial_attention.py ADDED Viewed

	@@ -0,0 +1,239 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import math
+import torch
+import torch.nn as nn
+class RowSelfAttention(nn.Module):
+    """Compute self-attention over rows of a 2D input."""
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        dropout=0.0,
+        max_tokens_per_msa: int = 2 ** 16,
+    ):
+        super().__init__()
+        self.num_heads = num_heads
+        self.dropout = dropout
+        self.head_dim = embed_dim // num_heads
+        self.scaling = self.head_dim ** -0.5
+        self.max_tokens_per_msa = max_tokens_per_msa
+        self.attn_shape = "hnij"
+        self.k_proj = nn.Linear(embed_dim, embed_dim)
+        self.v_proj = nn.Linear(embed_dim, embed_dim)
+        self.q_proj = nn.Linear(embed_dim, embed_dim)
+        self.out_proj = nn.Linear(embed_dim, embed_dim)
+        self.dropout_module = nn.Dropout(dropout)
+    def align_scaling(self, q):
+        num_rows = q.size(0)
+        return self.scaling / math.sqrt(num_rows)
+    def _batched_forward(
+        self,
+        x,
+        self_attn_mask=None,
+        self_attn_padding_mask=None,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        max_rows = max(1, self.max_tokens_per_msa // num_cols)
+        attns = 0
+        scaling = self.align_scaling(x)
+        for start in range(0, num_rows, max_rows):
+            attn_weights = self.compute_attention_weights(
+                x[start : start + max_rows],
+                scaling,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask[:, start : start + max_rows]
+                if self_attn_padding_mask is not None
+                else None,
+            )
+            attns += attn_weights
+        attn_probs = attns.softmax(-1)
+        attn_probs = self.dropout_module(attn_probs)
+        outputs = []
+        for start in range(0, num_rows, max_rows):
+            output = self.compute_attention_update(x[start : start + max_rows], attn_probs)
+            outputs.append(output)
+        output = torch.cat(outputs, 0)
+        return output, attn_probs
+    def compute_attention_weights(
+        self,
+        x,
+        scaling: float,
+        self_attn_mask=None,
+        self_attn_padding_mask=None,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        q = self.q_proj(x).view(num_rows, num_cols, batch_size, self.num_heads, self.head_dim)
+        k = self.k_proj(x).view(num_rows, num_cols, batch_size, self.num_heads, self.head_dim)
+        q *= scaling
+        if self_attn_padding_mask is not None:
+            # Zero out any padded aligned positions - this is important since
+            # we take a sum across the alignment axis.
+            q *= 1 - self_attn_padding_mask.permute(1, 2, 0).unsqueeze(3).unsqueeze(4).to(q)
+        attn_weights = torch.einsum(f"rinhd,rjnhd->{self.attn_shape}", q, k)
+        if self_attn_mask is not None:
+            raise NotImplementedError
+            # Mask Size: [B x R x C], Weights Size: [H x B x C x C]
+        if self_attn_padding_mask is not None:
+            attn_weights = attn_weights.masked_fill(
+                self_attn_padding_mask[:, 0].unsqueeze(0).unsqueeze(2),
+                -10000,
+            )
+        return attn_weights
+    def compute_attention_update(
+        self,
+        x,
+        attn_probs,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        v = self.v_proj(x).view(num_rows, num_cols, batch_size, self.num_heads, self.head_dim)
+        context = torch.einsum(f"{self.attn_shape},rjnhd->rinhd", attn_probs, v)
+        context = context.contiguous().view(num_rows, num_cols, batch_size, embed_dim)
+        output = self.out_proj(context)
+        return output
+    def forward(
+        self,
+        x,
+        self_attn_mask=None,
+        self_attn_padding_mask=None,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        if (num_rows * num_cols > self.max_tokens_per_msa) and not torch.is_grad_enabled():
+            return self._batched_forward(x, self_attn_mask, self_attn_padding_mask)
+        else:
+            scaling = self.align_scaling(x)
+            attn_weights = self.compute_attention_weights(
+                x, scaling, self_attn_mask, self_attn_padding_mask
+            )
+            attn_probs = attn_weights.softmax(-1)
+            attn_probs = self.dropout_module(attn_probs)
+            output = self.compute_attention_update(x, attn_probs)
+            return output, attn_probs
+class ColumnSelfAttention(nn.Module):
+    """Compute self-attention over columns of a 2D input."""
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        dropout=0.0,
+        max_tokens_per_msa: int = 2 ** 16,
+    ):
+        super().__init__()
+        self.num_heads = num_heads
+        self.dropout = dropout
+        self.head_dim = embed_dim // num_heads
+        self.scaling = self.head_dim ** -0.5
+        self.max_tokens_per_msa = max_tokens_per_msa
+        self.k_proj = nn.Linear(embed_dim, embed_dim)
+        self.v_proj = nn.Linear(embed_dim, embed_dim)
+        self.q_proj = nn.Linear(embed_dim, embed_dim)
+        self.out_proj = nn.Linear(embed_dim, embed_dim)
+        self.dropout_module = nn.Dropout(dropout)
+    def _batched_forward(
+        self,
+        x,
+        self_attn_mask=None,
+        self_attn_padding_mask=None,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        max_cols = max(1, self.max_tokens_per_msa // num_rows)
+        outputs = []
+        attns = []
+        for start in range(0, num_cols, max_cols):
+            output, attn = self(
+                x[:, start : start + max_cols],
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask[:, :, start : start + max_cols]
+                if self_attn_padding_mask is not None
+                else None,
+            )
+            outputs.append(output)
+            attns.append(attn)
+        output = torch.cat(outputs, 1)
+        attns = torch.cat(attns, 1)
+        return output, attns
+    def compute_attention_update(
+        self,
+        x,
+        self_attn_mask=None,
+        self_attn_padding_mask=None,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        if num_rows == 1:
+            # if there is only 1 position, this is equivalent and doesn't break with padding
+            attn_probs = torch.ones(
+                self.num_heads,
+                num_cols,
+                batch_size,
+                num_rows,
+                num_rows,
+                device=x.device,
+                dtype=x.dtype,
+            )
+            output = self.out_proj(self.v_proj(x))
+        else:
+            q = self.q_proj(x).view(num_rows, num_cols, batch_size, self.num_heads, self.head_dim)
+            k = self.k_proj(x).view(num_rows, num_cols, batch_size, self.num_heads, self.head_dim)
+            v = self.v_proj(x).view(num_rows, num_cols, batch_size, self.num_heads, self.head_dim)
+            q *= self.scaling
+            attn_weights = torch.einsum("icnhd,jcnhd->hcnij", q, k)
+            if self_attn_mask is not None:
+                raise NotImplementedError
+            if self_attn_padding_mask is not None:
+                attn_weights = attn_weights.masked_fill(
+                    self_attn_padding_mask.permute(2, 0, 1).unsqueeze(0).unsqueeze(3),
+                    -10000,
+                )
+            attn_probs = attn_weights.softmax(-1)
+            attn_probs = self.dropout_module(attn_probs)
+            context = torch.einsum("hcnij,jcnhd->icnhd", attn_probs, v)
+            context = context.contiguous().view(num_rows, num_cols, batch_size, embed_dim)
+            output = self.out_proj(context)
+        return output, attn_probs
+    def forward(
+        self,
+        x,
+        self_attn_mask=None,
+        self_attn_padding_mask=None,
+    ):
+        num_rows, num_cols, batch_size, embed_dim = x.size()
+        # if False and num_rows * num_cols > 2 ** 14 and not torch.is_grad_enabled():
+        if (num_rows * num_cols) > self.max_tokens_per_msa and not torch.is_grad_enabled():
+            return self._batched_forward(
+                x,
+                self_attn_mask,
+                self_attn_padding_mask,
+            )
+        else:
+            return self.compute_attention_update(x, self_attn_mask, self_attn_padding_mask)

esm/constants.py ADDED Viewed

	@@ -0,0 +1,10 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+# fmt: off
+proteinseq_toks = {
+    'toks': ['L', 'A', 'G', 'V', 'S', 'E', 'R', 'T', 'I', 'D', 'P', 'K', 'Q', 'N', 'F', 'Y', 'M', 'H', 'W', 'C', 'X', 'B', 'U', 'Z', 'O', '.', '-']
+}
+# fmt: on

esm/data.py ADDED Viewed

	@@ -0,0 +1,493 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import itertools
+import os
+from typing import Sequence, Tuple, List, Union
+import pickle
+import re
+import shutil
+import torch
+from pathlib import Path
+from esm.constants import proteinseq_toks
+RawMSA = Sequence[Tuple[str, str]]
+class FastaBatchedDataset(object):
+    def __init__(self, sequence_labels, sequence_strs):
+        self.sequence_labels = list(sequence_labels)
+        self.sequence_strs = list(sequence_strs)
+    @classmethod
+    def from_file(cls, fasta_file):
+        sequence_labels, sequence_strs = [], []
+        cur_seq_label = None
+        buf = []
+        def _flush_current_seq():
+            nonlocal cur_seq_label, buf
+            if cur_seq_label is None:
+                return
+            sequence_labels.append(cur_seq_label)
+            sequence_strs.append("".join(buf))
+            cur_seq_label = None
+            buf = []
+        with open(fasta_file, "r") as infile:
+            for line_idx, line in enumerate(infile):
+                if line.startswith(">"):  # label line
+                    _flush_current_seq()
+                    line = line[1:].strip()
+                    if len(line) > 0:
+                        cur_seq_label = line
+                    else:
+                        cur_seq_label = f"seqnum{line_idx:09d}"
+                else:  # sequence line
+                    buf.append(line.strip())
+        _flush_current_seq()
+        assert len(set(sequence_labels)) == len(
+            sequence_labels
+        ), "Found duplicate sequence labels"
+        return cls(sequence_labels, sequence_strs)
+    def __len__(self):
+        return len(self.sequence_labels)
+    def __getitem__(self, idx):
+        return self.sequence_labels[idx], self.sequence_strs[idx]
+    def get_batch_indices(self, toks_per_batch, extra_toks_per_seq=0):
+        sizes = [(len(s), i) for i, s in enumerate(self.sequence_strs)]
+        sizes.sort()
+        batches = []
+        buf = []
+        max_len = 0
+        def _flush_current_buf():
+            nonlocal max_len, buf
+            if len(buf) == 0:
+                return
+            batches.append(buf)
+            buf = []
+            max_len = 0
+        for sz, i in sizes:
+            sz += extra_toks_per_seq
+            if max(sz, max_len) * (len(buf) + 1) > toks_per_batch:
+                _flush_current_buf()
+            max_len = max(max_len, sz)
+            buf.append(i)
+        _flush_current_buf()
+        return batches
+class Alphabet(object):
+    def __init__(
+        self,
+        standard_toks: Sequence[str],
+        prepend_toks: Sequence[str] = ("<null_0>", "<pad>", "<eos>", "<unk>"),
+        append_toks: Sequence[str] = ("<cls>", "<mask>", "<sep>"),
+        prepend_bos: bool = True,
+        append_eos: bool = False,
+        use_msa: bool = False,
+    ):
+        self.standard_toks = list(standard_toks)
+        self.prepend_toks = list(prepend_toks)
+        self.append_toks = list(append_toks)
+        self.prepend_bos = prepend_bos
+        self.append_eos = append_eos
+        self.use_msa = use_msa
+        self.all_toks = list(self.prepend_toks)
+        self.all_toks.extend(self.standard_toks)
+        for i in range((8 - (len(self.all_toks) % 8)) % 8):
+            self.all_toks.append(f"<null_{i  + 1}>")
+        self.all_toks.extend(self.append_toks)
+        self.tok_to_idx = {tok: i for i, tok in enumerate(self.all_toks)}
+        self.unk_idx = self.tok_to_idx["<unk>"]
+        self.padding_idx = self.get_idx("<pad>")
+        self.cls_idx = self.get_idx("<cls>")
+        self.mask_idx = self.get_idx("<mask>")
+        self.eos_idx = self.get_idx("<eos>")
+        self.all_special_tokens = ['<eos>', '<unk>', '<pad>', '<cls>', '<mask>']
+        self.unique_no_split_tokens = self.all_toks
+    def __len__(self):
+        return len(self.all_toks)
+    def get_idx(self, tok):
+        return self.tok_to_idx.get(tok, self.unk_idx)
+    def get_tok(self, ind):
+        return self.all_toks[ind]
+    def to_dict(self):
+        return self.tok_to_idx.copy()
+    def get_batch_converter(self, truncation_seq_length: int = None):
+        if self.use_msa:
+            return MSABatchConverter(self, truncation_seq_length)
+        else:
+            return BatchConverter(self, truncation_seq_length)
+    @classmethod
+    def from_architecture(cls, name: str) -> "Alphabet":
+        if name in ("ESM-1", "protein_bert_base"):
+            standard_toks = proteinseq_toks["toks"]
+            prepend_toks: Tuple[str, ...] = ("<null_0>", "<pad>", "<eos>", "<unk>")
+            append_toks: Tuple[str, ...] = ("<cls>", "<mask>", "<sep>")
+            prepend_bos = True
+            append_eos = False
+            use_msa = False
+        elif name in ("ESM-1b", "roberta_large"):
+            standard_toks = proteinseq_toks["toks"]
+            prepend_toks = ("<cls>", "<pad>", "<eos>", "<unk>")
+            append_toks = ("<mask>",)
+            prepend_bos = True
+            append_eos = True
+            use_msa = False
+        elif name in ("MSA Transformer", "msa_transformer"):
+            standard_toks = proteinseq_toks["toks"]
+            prepend_toks = ("<cls>", "<pad>", "<eos>", "<unk>")
+            append_toks = ("<mask>",)
+            prepend_bos = True
+            append_eos = False
+            use_msa = True
+        elif "invariant_gvp" in name.lower():
+            standard_toks = proteinseq_toks["toks"]
+            prepend_toks = ("<null_0>", "<pad>", "<eos>", "<unk>")
+            append_toks = ("<mask>", "<cath>", "<af2>")
+            prepend_bos = True
+            append_eos = False
+            use_msa = False
+        else:
+            raise ValueError("Unknown architecture selected")
+        return cls(standard_toks, prepend_toks, append_toks, prepend_bos, append_eos, use_msa)
+    def _tokenize(self, text) -> str:
+        return text.split()
+    def tokenize(self, text, **kwargs) -> List[str]:
+        """
+        Inspired by https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_utils.py
+        Converts a string in a sequence of tokens, using the tokenizer.
+        Args:
+            text (:obj:`str`):
+                The sequence to be encoded.
+        Returns:
+            :obj:`List[str]`: The list of tokens.
+        """
+        def split_on_token(tok, text):
+            result = []
+            split_text = text.split(tok)
+            for i, sub_text in enumerate(split_text):
+                # AddedToken can control whitespace stripping around them.
+                # We use them for GPT2 and Roberta to have different behavior depending on the special token
+                # Cf. https://github.com/huggingface/transformers/pull/2778
+                # and https://github.com/huggingface/transformers/issues/3788
+                # We strip left and right by default
+                if i < len(split_text) - 1:
+                    sub_text = sub_text.rstrip()
+                if i > 0:
+                    sub_text = sub_text.lstrip()
+                if i == 0 and not sub_text:
+                    result.append(tok)
+                elif i == len(split_text) - 1:
+                    if sub_text:
+                        result.append(sub_text)
+                    else:
+                        pass
+                else:
+                    if sub_text:
+                        result.append(sub_text)
+                    result.append(tok)
+            return result
+        def split_on_tokens(tok_list, text):
+            if not text.strip():
+                return []
+            tokenized_text = []
+            text_list = [text]
+            for tok in tok_list:
+                tokenized_text = []
+                for sub_text in text_list:
+                    if sub_text not in self.unique_no_split_tokens:
+                        tokenized_text.extend(split_on_token(tok, sub_text))
+                    else:
+                        tokenized_text.append(sub_text)
+                text_list = tokenized_text
+            return list(
+                itertools.chain.from_iterable(
+                    (
+                        self._tokenize(token)
+                        if token not in self.unique_no_split_tokens
+                        else [token]
+                        for token in tokenized_text
+                    )
+                )
+            )
+        no_split_token = self.unique_no_split_tokens
+        tokenized_text = split_on_tokens(no_split_token, text)
+        return tokenized_text
+    def encode(self, text):
+        return [self.tok_to_idx[tok] for tok in self.tokenize(text)]
+class BatchConverter(object):
+    """Callable to convert an unprocessed (labels + strings) batch to a
+    processed (labels + tensor) batch.
+    """
+    def __init__(self, alphabet, truncation_seq_length: int = None):
+        self.alphabet = alphabet
+        self.truncation_seq_length = truncation_seq_length
+    def __call__(self, raw_batch: Sequence[Tuple[str, str]]):
+        # RoBERTa uses an eos token, while ESM-1 does not.
+        batch_size = len(raw_batch)
+        batch_labels, seq_str_list = zip(*raw_batch)
+        seq_encoded_list = [self.alphabet.encode(seq_str) for seq_str in seq_str_list]
+        if self.truncation_seq_length:
+            seq_encoded_list = [seq_str[:self.truncation_seq_length] for seq_str in seq_encoded_list]
+        max_len = max(len(seq_encoded) for seq_encoded in seq_encoded_list)
+        tokens = torch.empty(
+            (
+                batch_size,
+                max_len + int(self.alphabet.prepend_bos) + int(self.alphabet.append_eos),
+            ),
+            dtype=torch.int64,
+        )
+        tokens.fill_(self.alphabet.padding_idx)
+        labels = []
+        strs = []
+        for i, (label, seq_str, seq_encoded) in enumerate(
+            zip(batch_labels, seq_str_list, seq_encoded_list)
+        ):
+            labels.append(label)
+            strs.append(seq_str)
+            if self.alphabet.prepend_bos:
+                tokens[i, 0] = self.alphabet.cls_idx
+            seq = torch.tensor(seq_encoded, dtype=torch.int64)
+            tokens[
+                i,
+                int(self.alphabet.prepend_bos) : len(seq_encoded)
+                + int(self.alphabet.prepend_bos),
+            ] = seq
+            if self.alphabet.append_eos:
+                tokens[i, len(seq_encoded) + int(self.alphabet.prepend_bos)] = self.alphabet.eos_idx
+        return labels, strs, tokens
+class MSABatchConverter(BatchConverter):
+    def __call__(self, inputs: Union[Sequence[RawMSA], RawMSA]):
+        if isinstance(inputs[0][0], str):
+            # Input is a single MSA
+            raw_batch: Sequence[RawMSA] = [inputs]  # type: ignore
+        else:
+            raw_batch = inputs  # type: ignore
+        batch_size = len(raw_batch)
+        max_alignments = max(len(msa) for msa in raw_batch)
+        max_seqlen = max(len(msa[0][1]) for msa in raw_batch)
+        tokens = torch.empty(
+            (
+                batch_size,
+                max_alignments,
+                max_seqlen + int(self.alphabet.prepend_bos) + int(self.alphabet.append_eos),
+            ),
+            dtype=torch.int64,
+        )
+        tokens.fill_(self.alphabet.padding_idx)
+        labels = []
+        strs = []
+        for i, msa in enumerate(raw_batch):
+            msa_seqlens = set(len(seq) for _, seq in msa)
+            if not len(msa_seqlens) == 1:
+                raise RuntimeError(
+                    "Received unaligned sequences for input to MSA, all sequence "
+                    "lengths must be equal."
+                )
+            msa_labels, msa_strs, msa_tokens = super().__call__(msa)
+            labels.append(msa_labels)
+            strs.append(msa_strs)
+            tokens[i, : msa_tokens.size(0), : msa_tokens.size(1)] = msa_tokens
+        return labels, strs, tokens
+def read_fasta(
+    path,
+    keep_gaps=True,
+    keep_insertions=True,
+    to_upper=False,
+):
+    with open(path, "r") as f:
+        for result in read_alignment_lines(
+            f, keep_gaps=keep_gaps, keep_insertions=keep_insertions, to_upper=to_upper
+        ):
+            yield result
+def read_alignment_lines(
+    lines,
+    keep_gaps=True,
+    keep_insertions=True,
+    to_upper=False,
+):
+    seq = desc = None
+    def parse(s):
+        if not keep_gaps:
+            s = re.sub("-", "", s)
+        if not keep_insertions:
+            s = re.sub("[a-z]", "", s)
+        return s.upper() if to_upper else s
+    for line in lines:
+        # Line may be empty if seq % file_line_width == 0
+        if len(line) > 0 and line[0] == ">":
+            if seq is not None:
+                yield desc, parse(seq)
+            desc = line.strip().lstrip(">")
+            seq = ""
+        else:
+            assert isinstance(seq, str)
+            seq += line.strip()
+    assert isinstance(seq, str) and isinstance(desc, str)
+    yield desc, parse(seq)
+class ESMStructuralSplitDataset(torch.utils.data.Dataset):
+    """
+    Structural Split Dataset as described in section A.10 of the supplement of our paper.
+    https://doi.org/10.1101/622803
+    We use the full version of SCOPe 2.07, clustered at 90% sequence identity,
+    generated on January 23, 2020.
+    For each SCOPe domain:
+        - We extract the sequence from the corresponding PDB file
+        - We extract the 3D coordinates of the Carbon beta atoms, aligning them
+          to the sequence. We put NaN where Cb atoms are missing.
+        - From the 3D coordinates, we calculate a pairwise distance map, based
+          on L2 distance
+        - We use DSSP to generate secondary structure labels for the corresponding
+          PDB file. This is also aligned to the sequence. We put - where SSP
+          labels are missing.
+    For each SCOPe classification level of family/superfamily/fold (in order of difficulty),
+    we have split the data into 5 partitions for cross validation. These are provided
+    in a downloaded splits folder, in the format:
+            splits/{split_level}/{cv_partition}/{train|valid}.txt
+    where train is the partition and valid is the concatentation of the remaining 4.
+    For each SCOPe domain, we provide a pkl dump that contains:
+        - seq    : The domain sequence, stored as an L-length string
+        - ssp    : The secondary structure labels, stored as an L-length string
+        - dist   : The distance map, stored as an LxL numpy array
+        - coords : The 3D coordinates, stored as an Lx3 numpy array
+    """
+    base_folder = "structural-data"
+    file_list = [
+        #  url  tar filename   filename      MD5 Hash
+        (
+            "https://dl.fbaipublicfiles.com/fair-esm/structural-data/splits.tar.gz",
+            "splits.tar.gz",
+            "splits",
+            "456fe1c7f22c9d3d8dfe9735da52411d",
+        ),
+        (
+            "https://dl.fbaipublicfiles.com/fair-esm/structural-data/pkl.tar.gz",
+            "pkl.tar.gz",
+            "pkl",
+            "644ea91e56066c750cd50101d390f5db",
+        ),
+    ]
+    def __init__(
+        self,
+        split_level,
+        cv_partition,
+        split,
+        root_path=os.path.expanduser("~/.cache/torch/data/esm"),
+        download=False,
+    ):
+        super().__init__()
+        assert split in [
+            "train",
+            "valid",
+        ], "train_valid must be 'train' or 'valid'"
+        self.root_path = root_path
+        self.base_path = os.path.join(self.root_path, self.base_folder)
+        # check if root path has what you need or else download it
+        if download:
+            self.download()
+        self.split_file = os.path.join(
+            self.base_path, "splits", split_level, cv_partition, f"{split}.txt"
+        )
+        self.pkl_dir = os.path.join(self.base_path, "pkl")
+        self.names = []
+        with open(self.split_file) as f:
+            self.names = f.read().splitlines()
+    def __len__(self):
+        return len(self.names)
+    def _check_exists(self) -> bool:
+        for (_, _, filename, _) in self.file_list:
+            fpath = os.path.join(self.base_path, filename)
+            if not os.path.exists(fpath) or not os.path.isdir(fpath):
+                return False
+        return True
+    def download(self):
+        if self._check_exists():
+            print("Files already downloaded and verified")
+            return
+        from torchvision.datasets.utils import download_url
+        for url, tar_filename, filename, md5_hash in self.file_list:
+            download_path = os.path.join(self.base_path, tar_filename)
+            download_url(url=url, root=self.base_path, filename=tar_filename, md5=md5_hash)
+            shutil.unpack_archive(download_path, self.base_path)
+    def __getitem__(self, idx):
+        """
+        Returns a dict with the following entires
+         - seq : Str (domain sequence)
+         - ssp : Str (SSP labels)
+         - dist : np.array (distance map)
+         - coords : np.array (3D coordinates)
+        """
+        name = self.names[idx]
+        pkl_fname = os.path.join(self.pkl_dir, name[1:3], f"{name}.pkl")
+        with open(pkl_fname, "rb") as f:
+            obj = pickle.load(f)
+        return obj

esm/esmfold/v1/__init__.py ADDED Viewed

File without changes

esm/esmfold/v1/categorical_mixture.py ADDED Viewed

	@@ -0,0 +1,43 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+class CategoricalMixture:
+    def __init__(self, param, bins=50, start=0, end=1):
+        # All tensors are of shape ..., bins.
+        self.logits = param
+        bins = torch.linspace(
+            start, end, bins + 1, device=self.logits.device, dtype=self.logits.dtype
+        )
+        self.v_bins = (bins[:-1] + bins[1:]) / 2
+    def log_prob(self, true):
+        # Shapes are:
+        #     self.probs: ... x bins
+        #     true      : ...
+        true_index = (
+            (
+                true.unsqueeze(-1)
+                - self.v_bins[
+                    [
+                        None,
+                    ]
+                    * true.ndim
+                ]
+            )
+            .abs()
+            .argmin(-1)
+        )
+        nll = self.logits.log_softmax(-1)
+        return torch.take_along_dim(nll, true_index.unsqueeze(-1), dim=-1).squeeze(-1)
+    def mean(self):
+        return (self.logits.softmax(-1) @ self.v_bins.unsqueeze(1)).squeeze(-1)
+def categorical_lddt(logits, bins=50):
+    # Logits are ..., 37, bins.
+    return CategoricalMixture(logits, bins=bins).mean()

esm/esmfold/v1/esmfold.py ADDED Viewed

	@@ -0,0 +1,364 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import typing as T
+from dataclasses import dataclass
+from functools import partial
+import torch
+import torch.nn as nn
+from torch import nn
+from torch.nn import LayerNorm
+import esm
+from esm import Alphabet
+from esm.esmfold.v1.categorical_mixture import categorical_lddt
+from esm.esmfold.v1.misc import (
+    batch_encode_sequences,
+    collate_dense_tensors,
+    output_to_pdb,
+)
+from esm.esmfold.v1.trunk import FoldingTrunk, FoldingTrunkConfig
+from openfold.data.data_transforms import make_atom14_masks
+from openfold.np import residue_constants
+from openfold.utils.loss import compute_predicted_aligned_error, compute_tm
+@dataclass
+class ESMFoldConfig:
+    trunk: T.Any = FoldingTrunkConfig()
+    lddt_head_hid_dim: int = 128
+load_fn = esm.pretrained.load_model_and_alphabet
+esm_registry = {
+    "esm2_8M": partial(load_fn, "esm2_t6_8M_UR50D_500K"),
+    "esm2_8M_270K": esm.pretrained.esm2_t6_8M_UR50D,
+    "esm2_35M": partial(load_fn, "esm2_t12_35M_UR50D_500K"),
+    "esm2_35M_270K": esm.pretrained.esm2_t12_35M_UR50D,
+    "esm2_150M": partial(load_fn, "esm2_t30_150M_UR50D_500K"),
+    "esm2_150M_270K": partial(load_fn, "esm2_t30_150M_UR50D_270K"),
+    "esm2_650M": esm.pretrained.esm2_t33_650M_UR50D,
+    "esm2_650M_270K": partial(load_fn, "esm2_t33_650M_270K_UR50D"),
+    "esm2_3B": esm.pretrained.esm2_t36_3B_UR50D,
+    "esm2_3B_270K": partial(load_fn, "esm2_t36_3B_UR50D_500K"),
+    "esm2_15B": esm.pretrained.esm2_t48_15B_UR50D,
+}
+class ESMFold(nn.Module):
+    def __init__(self, esmfold_config=None, **kwargs):
+        super().__init__()
+        self.cfg = esmfold_config if esmfold_config else ESMFoldConfig(**kwargs)
+        cfg = self.cfg
+        self.distogram_bins = 64
+        self.esm, self.esm_dict = esm_registry.get(cfg.esm_type)()
+        self.esm.requires_grad_(False)
+        self.esm.half()
+        self.esm_feats = self.esm.embed_dim
+        self.esm_attns = self.esm.num_layers * self.esm.attention_heads
+        self.register_buffer("af2_to_esm", ESMFold._af2_to_esm(self.esm_dict))
+        self.esm_s_combine = nn.Parameter(torch.zeros(self.esm.num_layers + 1))
+        c_s = cfg.trunk.sequence_state_dim
+        c_z = cfg.trunk.pairwise_state_dim
+        self.esm_s_mlp = nn.Sequential(
+            LayerNorm(self.esm_feats),
+            nn.Linear(self.esm_feats, c_s),
+            nn.ReLU(),
+            nn.Linear(c_s, c_s),
+        )
+        if cfg.use_esm_attn_map:
+            self.esm_z_mlp = nn.Sequential(
+                LayerNorm(self.esm_attns),
+                nn.Linear(self.esm_attns, c_z),
+                nn.ReLU(),
+                nn.Linear(c_z, c_z),
+            )
+        # 0 is padding, N is unknown residues, N + 1 is mask.
+        self.n_tokens_embed = residue_constants.restype_num + 3
+        self.pad_idx = 0
+        self.unk_idx = self.n_tokens_embed - 2
+        self.mask_idx = self.n_tokens_embed - 1
+        self.embedding = nn.Embedding(self.n_tokens_embed, c_s, padding_idx=0)
+        self.trunk = FoldingTrunk(**cfg.trunk)
+        self.distogram_head = nn.Linear(c_z, self.distogram_bins)
+        self.ptm_head = nn.Linear(c_z, self.distogram_bins)
+        self.lm_head = nn.Linear(c_s, self.n_tokens_embed)
+        self.lddt_bins = 50
+        self.lddt_head = nn.Sequential(
+            nn.LayerNorm(cfg.trunk.structure_module.c_s),
+            nn.Linear(cfg.trunk.structure_module.c_s, cfg.lddt_head_hid_dim),
+            nn.Linear(cfg.lddt_head_hid_dim, cfg.lddt_head_hid_dim),
+            nn.Linear(cfg.lddt_head_hid_dim, 37 * self.lddt_bins),
+        )
+    @staticmethod
+    def _af2_to_esm(d: Alphabet):
+        # Remember that t is shifted from residue_constants by 1 (0 is padding).
+        esm_reorder = [d.padding_idx] + [
+            d.get_idx(v) for v in residue_constants.restypes_with_x
+        ]
+        return torch.tensor(esm_reorder)
+    def _af2_idx_to_esm_idx(self, aa, mask):
+        aa = (aa + 1).masked_fill(mask != 1, 0)
+        return self.af2_to_esm[aa]
+    def _compute_language_model_representations(
+        self, esmaa: torch.Tensor
+    ) -> torch.Tensor:
+        """Adds bos/eos tokens for the language model, since the structure module doesn't use these."""
+        batch_size = esmaa.size(0)
+        bosi, eosi = self.esm_dict.cls_idx, self.esm_dict.eos_idx
+        bos = esmaa.new_full((batch_size, 1), bosi)
+        eos = esmaa.new_full((batch_size, 1), self.esm_dict.padding_idx)
+        esmaa = torch.cat([bos, esmaa, eos], dim=1)
+        # Use the first padding index as eos during inference.
+        esmaa[range(batch_size), (esmaa != 1).sum(1)] = eosi
+        res = self.esm(
+            esmaa,
+            repr_layers=range(self.esm.num_layers + 1),
+            need_head_weights=self.cfg.use_esm_attn_map,
+        )
+        esm_s = torch.stack(
+            [v for _, v in sorted(res["representations"].items())], dim=2
+        )
+        esm_s = esm_s[:, 1:-1]  # B, L, nLayers, C
+        esm_z = (
+            res["attentions"].permute(0, 4, 3, 1, 2).flatten(3, 4)[:, 1:-1, 1:-1, :]
+            if self.cfg.use_esm_attn_map
+            else None
+        )
+        return esm_s, esm_z
+    def _mask_inputs_to_esm(self, esmaa, pattern):
+        new_esmaa = esmaa.clone()
+        new_esmaa[pattern == 1] = self.esm_dict.mask_idx
+        return new_esmaa
+    def forward(
+        self,
+        aa: torch.Tensor,
+        mask: T.Optional[torch.Tensor] = None,
+        residx: T.Optional[torch.Tensor] = None,
+        masking_pattern: T.Optional[torch.Tensor] = None,
+        num_recycles: T.Optional[int] = None,
+    ):
+        """Runs a forward pass given input tokens. Use `model.infer` to
+        run inference from a sequence.
+        Args:
+            aa (torch.Tensor): Tensor containing indices corresponding to amino acids. Indices match
+                openfold.np.residue_constants.restype_order_with_x.
+            mask (torch.Tensor): Binary tensor with 1 meaning position is unmasked and 0 meaning position is masked.
+            residx (torch.Tensor): Residue indices of amino acids. Will assume contiguous if not provided.
+            masking_pattern (torch.Tensor): Optional masking to pass to the input. Binary tensor of the same size
+                as `aa`. Positions with 1 will be masked. ESMFold sometimes produces different samples when
+                different masks are provided.
+            num_recycles (int): How many recycle iterations to perform. If None, defaults to training max
+                recycles, which is 3.
+        """
+        if mask is None:
+            mask = torch.ones_like(aa)
+        B = aa.shape[0]
+        L = aa.shape[1]
+        device = aa.device
+        if residx is None:
+            residx = torch.arange(L, device=device).expand_as(aa)
+        # === ESM ===
+        esmaa = self._af2_idx_to_esm_idx(aa, mask)
+        if masking_pattern is not None:
+            esmaa = self._mask_inputs_to_esm(esmaa, masking_pattern)
+        esm_s, esm_z = self._compute_language_model_representations(esmaa)
+        # Convert esm_s to the precision used by the trunk and
+        # the structure module. These tensors may be a lower precision if, for example,
+        # we're running the language model in fp16 precision.
+        esm_s = esm_s.to(self.esm_s_combine.dtype)
+        esm_s = esm_s.detach()
+        # === preprocessing ===
+        esm_s = (self.esm_s_combine.softmax(0).unsqueeze(0) @ esm_s).squeeze(2)
+        s_s_0 = self.esm_s_mlp(esm_s)
+        if self.cfg.use_esm_attn_map:
+            esm_z = esm_z.to(self.esm_s_combine.dtype)
+            esm_z = esm_z.detach()
+            s_z_0 = self.esm_z_mlp(esm_z)
+        else:
+            s_z_0 = s_s_0.new_zeros(B, L, L, self.cfg.trunk.pairwise_state_dim)
+        s_s_0 += self.embedding(aa)
+        structure: dict = self.trunk(
+            s_s_0, s_z_0, aa, residx, mask, no_recycles=num_recycles
+        )
+        # Documenting what we expect:
+        structure = {
+            k: v
+            for k, v in structure.items()
+            if k
+            in [
+                "s_z",
+                "s_s",
+                "frames",
+                "sidechain_frames",
+                "unnormalized_angles",
+                "angles",
+                "positions",
+                "states",
+            ]
+        }
+        disto_logits = self.distogram_head(structure["s_z"])
+        disto_logits = (disto_logits + disto_logits.transpose(1, 2)) / 2
+        structure["distogram_logits"] = disto_logits
+        lm_logits = self.lm_head(structure["s_s"])
+        structure["lm_logits"] = lm_logits
+        structure["aatype"] = aa
+        make_atom14_masks(structure)
+        for k in [
+            "atom14_atom_exists",
+            "atom37_atom_exists",
+        ]:
+            structure[k] *= mask.unsqueeze(-1)
+        structure["residue_index"] = residx
+        lddt_head = self.lddt_head(structure["states"]).reshape(
+            structure["states"].shape[0], B, L, -1, self.lddt_bins
+        )
+        structure["lddt_head"] = lddt_head
+        plddt = categorical_lddt(lddt_head[-1], bins=self.lddt_bins)
+        structure["plddt"] = (
+            100 * plddt
+        )  # we predict plDDT between 0 and 1, scale to be between 0 and 100.
+        ptm_logits = self.ptm_head(structure["s_z"])
+        seqlen = mask.type(torch.int64).sum(1)
+        structure["ptm_logits"] = ptm_logits
+        structure["ptm"] = torch.stack(
+            [
+                compute_tm(
+                    batch_ptm_logits[None, :sl, :sl],
+                    max_bins=31,
+                    no_bins=self.distogram_bins,
+                )
+                for batch_ptm_logits, sl in zip(ptm_logits, seqlen)
+            ]
+        )
+        structure.update(
+            compute_predicted_aligned_error(
+                ptm_logits, max_bin=31, no_bins=self.distogram_bins
+            )
+        )
+        return structure
+    @torch.no_grad()
+    def infer(
+        self,
+        sequences: T.Union[str, T.List[str]],
+        residx=None,
+        masking_pattern: T.Optional[torch.Tensor] = None,
+        num_recycles: T.Optional[int] = None,
+        residue_index_offset: T.Optional[int] = 512,
+        chain_linker: T.Optional[str] = "G" * 25,
+    ):
+        """Runs a forward pass given input sequences.
+        Args:
+            sequences (Union[str, List[str]]): A list of sequences to make predictions for. Multimers can also be passed in,
+                each chain should be separated by a ':' token (e.g. "<chain1>:<chain2>:<chain3>").
+            residx (torch.Tensor): Residue indices of amino acids. Will assume contiguous if not provided.
+            masking_pattern (torch.Tensor): Optional masking to pass to the input. Binary tensor of the same size
+                as `aa`. Positions with 1 will be masked. ESMFold sometimes produces different samples when
+                different masks are provided.
+            num_recycles (int): How many recycle iterations to perform. If None, defaults to training max
+                recycles (cfg.trunk.max_recycles), which is 4.
+            residue_index_offset (int): Residue index separation between chains if predicting a multimer. Has no effect on
+                single chain predictions. Default: 512.
+            chain_linker (str): Linker to use between chains if predicting a multimer. Has no effect on single chain
+                predictions. Default: length-25 poly-G ("G" * 25).
+        """
+        if isinstance(sequences, str):
+            sequences = [sequences]
+        aatype, mask, _residx, linker_mask, chain_index = batch_encode_sequences(
+            sequences, residue_index_offset, chain_linker
+        )
+        if residx is None:
+            residx = _residx
+        elif not isinstance(residx, torch.Tensor):
+            residx = collate_dense_tensors(residx)
+        aatype, mask, residx, linker_mask = map(
+            lambda x: x.to(self.device), (aatype, mask, residx, linker_mask)
+        )
+        output = self.forward(
+            aatype,
+            mask=mask,
+            residx=residx,
+            masking_pattern=masking_pattern,
+            num_recycles=num_recycles,
+        )
+        output["atom37_atom_exists"] = output[
+            "atom37_atom_exists"
+        ] * linker_mask.unsqueeze(2)
+        output["mean_plddt"] = (output["plddt"] * output["atom37_atom_exists"]).sum(
+            dim=(1, 2)
+        ) / output["atom37_atom_exists"].sum(dim=(1, 2))
+        output["chain_index"] = chain_index
+        return output
+    def output_to_pdb(self, output: T.Dict) -> T.List[str]:
+        """Returns the pbd (file) string from the model given the model output."""
+        return output_to_pdb(output)
+    def infer_pdbs(self, seqs: T.List[str], *args, **kwargs) -> T.List[str]:
+        """Returns list of pdb (files) strings from the model given a list of input sequences."""
+        output = self.infer(seqs, *args, **kwargs)
+        return self.output_to_pdb(output)
+    def infer_pdb(self, sequence: str, *args, **kwargs) -> str:
+        """Returns the pdb (file) string from the model given an input sequence."""
+        return self.infer_pdbs([sequence], *args, **kwargs)[0]
+    def set_chunk_size(self, chunk_size: T.Optional[int]):
+        # This parameter means the axial attention will be computed
+        # in a chunked manner. This should make the memory used more or less O(L) instead of O(L^2).
+        # It's equivalent to running a for loop over chunks of the dimension we're iterative over,
+        # where the chunk_size is the size of the chunks, so 128 would mean to parse 128-lengthed chunks.
+        # Setting the value to None will return to default behavior, disable chunking.
+        self.trunk.set_chunk_size(chunk_size)
+    @property
+    def device(self):
+        return self.esm_s_combine.device

esm/esmfold/v1/misc.py ADDED Viewed

	@@ -0,0 +1,309 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import typing as T
+import numpy as np
+import torch
+import torch.nn.functional as F
+from einops import rearrange, repeat
+from torch import nn
+from openfold.np import residue_constants
+from openfold.np.protein import Protein as OFProtein
+from openfold.np.protein import to_pdb
+from openfold.utils.feats import atom14_to_atom37
+def encode_sequence(
+    seq: str,
+    residue_index_offset: T.Optional[int] = 512,
+    chain_linker: T.Optional[str] = "G" * 25,
+) -> T.Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
+    if chain_linker is None:
+        chain_linker = ""
+    if residue_index_offset is None:
+        residue_index_offset = 0
+    chains = seq.split(":")
+    seq = chain_linker.join(chains)
+    unk_idx = residue_constants.restype_order_with_x["X"]
+    encoded = torch.tensor(
+        [residue_constants.restype_order_with_x.get(aa, unk_idx) for aa in seq]
+    )
+    residx = torch.arange(len(encoded))
+    if residue_index_offset > 0:
+        start = 0
+        for i, chain in enumerate(chains):
+            residx[start : start + len(chain) + len(chain_linker)] += (
+                i * residue_index_offset
+            )
+            start += len(chain) + len(chain_linker)
+    linker_mask = torch.ones_like(encoded, dtype=torch.float32)
+    chain_index = []
+    offset = 0
+    for i, chain in enumerate(chains):
+        if i > 0:
+            chain_index.extend([i - 1] * len(chain_linker))
+        chain_index.extend([i] * len(chain))
+        offset += len(chain)
+        linker_mask[offset : offset + len(chain_linker)] = 0
+        offset += len(chain_linker)
+    chain_index = torch.tensor(chain_index, dtype=torch.int64)
+    return encoded, residx, linker_mask, chain_index
+def batch_encode_sequences(
+    sequences: T.Sequence[str],
+    residue_index_offset: T.Optional[int] = 512,
+    chain_linker: T.Optional[str] = "G" * 25,
+) -> T.Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
+    aatype_list = []
+    residx_list = []
+    linker_mask_list = []
+    chain_index_list = []
+    for seq in sequences:
+        aatype_seq, residx_seq, linker_mask_seq, chain_index_seq = encode_sequence(
+            seq,
+            residue_index_offset=residue_index_offset,
+            chain_linker=chain_linker,
+        )
+        aatype_list.append(aatype_seq)
+        residx_list.append(residx_seq)
+        linker_mask_list.append(linker_mask_seq)
+        chain_index_list.append(chain_index_seq)
+    aatype = collate_dense_tensors(aatype_list)
+    mask = collate_dense_tensors(
+        [aatype.new_ones(len(aatype_seq)) for aatype_seq in aatype_list]
+    )
+    residx = collate_dense_tensors(residx_list)
+    linker_mask = collate_dense_tensors(linker_mask_list)
+    chain_index_list = collate_dense_tensors(chain_index_list, -1)
+    return aatype, mask, residx, linker_mask, chain_index_list
+def output_to_pdb(output: T.Dict) -> T.List[str]:
+    """Returns the pbd (file) string from the model given the model output."""
+    # atom14_to_atom37 must be called first, as it fails on latest numpy if the
+    # input is a numpy array. It will work if the input is a torch tensor.
+    final_atom_positions = atom14_to_atom37(output["positions"][-1], output)
+    output = {k: v.to("cpu").numpy() for k, v in output.items()}
+    final_atom_positions = final_atom_positions.cpu().numpy()
+    final_atom_mask = output["atom37_atom_exists"]
+    pdbs = []
+    for i in range(output["aatype"].shape[0]):
+        aa = output["aatype"][i]
+        pred_pos = final_atom_positions[i]
+        mask = final_atom_mask[i]
+        resid = output["residue_index"][i] + 1
+        pred = OFProtein(
+            aatype=aa,
+            atom_positions=pred_pos,
+            atom_mask=mask,
+            residue_index=resid,
+            b_factors=output["plddt"][i],
+            chain_index=output["chain_index"][i] if "chain_index" in output else None,
+        )
+        pdbs.append(to_pdb(pred))
+    return pdbs
+def collate_dense_tensors(
+    samples: T.List[torch.Tensor], pad_v: float = 0
+) -> torch.Tensor:
+    """
+    Takes a list of tensors with the following dimensions:
+        [(d_11,       ...,           d_1K),
+         (d_21,       ...,           d_2K),
+         ...,
+         (d_N1,       ...,           d_NK)]
+    and stack + pads them into a single tensor of:
+    (N, max_i=1,N { d_i1 }, ..., max_i=1,N {diK})
+    """
+    if len(samples) == 0:
+        return torch.Tensor()
+    if len(set(x.dim() for x in samples)) != 1:
+        raise RuntimeError(
+            f"Samples has varying dimensions: {[x.dim() for x in samples]}"
+        )
+    (device,) = tuple(set(x.device for x in samples))  # assumes all on same device
+    max_shape = [max(lst) for lst in zip(*[x.shape for x in samples])]
+    result = torch.empty(
+        len(samples), *max_shape, dtype=samples[0].dtype, device=device
+    )
+    result.fill_(pad_v)
+    for i in range(len(samples)):
+        result_i = result[i]
+        t = samples[i]
+        result_i[tuple(slice(0, k) for k in t.shape)] = t
+    return result
+class Attention(nn.Module):
+    def __init__(self, embed_dim, num_heads, head_width, gated=False):
+        super().__init__()
+        assert embed_dim == num_heads * head_width
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.head_width = head_width
+        self.proj = nn.Linear(embed_dim, embed_dim * 3, bias=False)
+        self.o_proj = nn.Linear(embed_dim, embed_dim, bias=True)
+        self.gated = gated
+        if gated:
+            self.g_proj = nn.Linear(embed_dim, embed_dim)
+            torch.nn.init.zeros_(self.g_proj.weight)
+            torch.nn.init.ones_(self.g_proj.bias)
+        self.rescale_factor = self.head_width**-0.5
+        torch.nn.init.zeros_(self.o_proj.bias)
+    def forward(self, x, mask=None, bias=None, indices=None):
+        """
+        Basic self attention with optional mask and external pairwise bias.
+        To handle sequences of different lengths, use mask.
+        Inputs:
+          x: batch of input sequneces (.. x L x C)
+          mask: batch of boolean masks where 1=valid, 0=padding position (.. x L_k). optional.
+          bias: batch of scalar pairwise attention biases (.. x Lq x Lk x num_heads). optional.
+        Outputs:
+          sequence projection (B x L x embed_dim), attention maps (B x L x L x num_heads)
+        """
+        t = rearrange(self.proj(x), "... l (h c) -> ... h l c", h=self.num_heads)
+        q, k, v = t.chunk(3, dim=-1)
+        q = self.rescale_factor * q
+        a = torch.einsum("...qc,...kc->...qk", q, k)
+        # Add external attention bias.
+        if bias is not None:
+            a = a + rearrange(bias, "... lq lk h -> ... h lq lk")
+        # Do not attend to padding tokens.
+        if mask is not None:
+            mask = repeat(
+                mask, "... lk -> ... h lq lk", h=self.num_heads, lq=q.shape[-2]
+            )
+            a = a.masked_fill(mask == False, -np.inf)
+        a = F.softmax(a, dim=-1)
+        y = torch.einsum("...hqk,...hkc->...qhc", a, v)
+        y = rearrange(y, "... h c -> ... (h c)", h=self.num_heads)
+        if self.gated:
+            y = self.g_proj(x).sigmoid() * y
+        y = self.o_proj(y)
+        return y, rearrange(a, "... lq lk h -> ... h lq lk")
+class Dropout(nn.Module):
+    """
+    Implementation of dropout with the ability to share the dropout mask
+    along a particular dimension.
+    """
+    def __init__(self, r: float, batch_dim: T.Union[int, T.List[int]]):
+        super(Dropout, self).__init__()
+        self.r = r
+        if type(batch_dim) == int:
+            batch_dim = [batch_dim]
+        self.batch_dim = batch_dim
+        self.dropout = nn.Dropout(self.r)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        shape = list(x.shape)
+        if self.batch_dim is not None:
+            for bd in self.batch_dim:
+                shape[bd] = 1
+        return x * self.dropout(x.new_ones(shape))
+class SequenceToPair(nn.Module):
+    def __init__(self, sequence_state_dim, inner_dim, pairwise_state_dim):
+        super().__init__()
+        self.layernorm = nn.LayerNorm(sequence_state_dim)
+        self.proj = nn.Linear(sequence_state_dim, inner_dim * 2, bias=True)
+        self.o_proj = nn.Linear(2 * inner_dim, pairwise_state_dim, bias=True)
+        torch.nn.init.zeros_(self.proj.bias)
+        torch.nn.init.zeros_(self.o_proj.bias)
+    def forward(self, sequence_state):
+        """
+        Inputs:
+          sequence_state: B x L x sequence_state_dim
+        Output:
+          pairwise_state: B x L x L x pairwise_state_dim
+        Intermediate state:
+          B x L x L x 2*inner_dim
+        """
+        assert len(sequence_state.shape) == 3
+        s = self.layernorm(sequence_state)
+        s = self.proj(s)
+        q, k = s.chunk(2, dim=-1)
+        prod = q[:, None, :, :] * k[:, :, None, :]
+        diff = q[:, None, :, :] - k[:, :, None, :]
+        x = torch.cat([prod, diff], dim=-1)
+        x = self.o_proj(x)
+        return x
+class PairToSequence(nn.Module):
+    def __init__(self, pairwise_state_dim, num_heads):
+        super().__init__()
+        self.layernorm = nn.LayerNorm(pairwise_state_dim)
+        self.linear = nn.Linear(pairwise_state_dim, num_heads, bias=False)
+    def forward(self, pairwise_state):
+        """
+        Inputs:
+          pairwise_state: B x L x L x pairwise_state_dim
+        Output:
+          pairwise_bias: B x L x L x num_heads
+        """
+        assert len(pairwise_state.shape) == 4
+        z = self.layernorm(pairwise_state)
+        pairwise_bias = self.linear(z)
+        return pairwise_bias
+class ResidueMLP(nn.Module):
+    def __init__(self, embed_dim, inner_dim, norm=nn.LayerNorm, dropout=0):
+        super().__init__()
+        self.mlp = nn.Sequential(
+            norm(embed_dim),
+            nn.Linear(embed_dim, inner_dim),
+            nn.ReLU(),
+            nn.Linear(inner_dim, embed_dim),
+            nn.Dropout(dropout),
+        )
+    def forward(self, x):
+        return x + self.mlp(x)

esm/esmfold/v1/pretrained.py ADDED Viewed

	@@ -0,0 +1,181 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+from pathlib import Path
+import torch
+from esm.esmfold.v1.esmfold import ESMFold
+def _load_model(model_name):
+    if model_name.endswith(".pt"):  # local, treat as filepath
+        model_path = Path(model_name)
+        model_data = torch.load(str(model_path), map_location="cpu")
+    else:  # load from hub
+        url = f"https://dl.fbaipublicfiles.com/fair-esm/models/{model_name}.pt"
+        model_data = torch.hub.load_state_dict_from_url(url, progress=False, map_location="cpu")
+    cfg = model_data["cfg"]["model"]
+    model_state = model_data["model"]
+    model = ESMFold(esmfold_config=cfg)
+    expected_keys = set(model.state_dict().keys())
+    found_keys = set(model_state.keys())
+    missing_essential_keys = []
+    for missing_key in expected_keys - found_keys:
+        if not missing_key.startswith("esm."):
+            missing_essential_keys.append(missing_key)
+    if missing_essential_keys:
+        raise RuntimeError(f"Keys '{', '.join(missing_essential_keys)}' are missing.")
+    model.load_state_dict(model_state, strict=False)
+    return model
+def esmfold_v0():
+    """
+    ESMFold v0 model with 3B ESM-2, 48 folding blocks.
+    This version was used for the paper (Lin et al, 2022). It was trained
+    on all PDB chains until 2020-05, to ensure temporal holdout with CASP14
+    and the CAMEO validation and test set reported there.
+    """
+    return _load_model("esmfold_3B_v0")
+def esmfold_v1():
+    """
+    ESMFold v1 model using 3B ESM-2, 48 folding blocks.
+    ESMFold provides fast high accuracy atomic level structure prediction
+    directly from the individual sequence of a protein. ESMFold uses the ESM2
+    protein language model to extract meaningful representations from the
+    protein sequence.
+    """
+    return _load_model("esmfold_3B_v1")
+def esmfold_structure_module_only_8M():
+    """
+    ESMFold baseline model using 8M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 500K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_8M")
+def esmfold_structure_module_only_8M_270K():
+    """
+    ESMFold baseline model using 8M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 270K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_8M_270K")
+def esmfold_structure_module_only_35M():
+    """
+    ESMFold baseline model using 35M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 500K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_35M")
+def esmfold_structure_module_only_35M_270K():
+    """
+    ESMFold baseline model using 35M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 270K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_35M_270K")
+def esmfold_structure_module_only_150M():
+    """
+    ESMFold baseline model using 150M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 500K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_150M")
+def esmfold_structure_module_only_150M_270K():
+    """
+    ESMFold baseline model using 150M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 270K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_150M_270K")
+def esmfold_structure_module_only_650M():
+    """
+    ESMFold baseline model using 650M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 500K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_650M")
+def esmfold_structure_module_only_650M_270K():
+    """
+    ESMFold baseline model using 650M ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 270K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_650M_270K")
+def esmfold_structure_module_only_3B():
+    """
+    ESMFold baseline model using 3B ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 500K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_3B")
+def esmfold_structure_module_only_3B_270K():
+    """
+    ESMFold baseline model using 3B ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 270K updates.
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_3B_270K")
+def esmfold_structure_module_only_15B():
+    """
+    ESMFold baseline model using 15B ESM-2, 0 folding blocks.
+    ESM-2 here is trained out to 270K updates.
+    The 15B parameter ESM-2 was not trained out to 500K updates
+    This is a model designed to test the capabilities of the language model
+    when ablated for number of parameters in the language model.
+    See table S1 in (Lin et al, 2022).
+    """
+    return _load_model("esmfold_structure_module_only_15B")

esm/esmfold/v1/tri_self_attn_block.py ADDED Viewed

	@@ -0,0 +1,160 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+from openfold.model.triangular_attention import (
+    TriangleAttentionEndingNode,
+    TriangleAttentionStartingNode,
+)
+from openfold.model.triangular_multiplicative_update import (
+    TriangleMultiplicationIncoming,
+    TriangleMultiplicationOutgoing,
+)
+from torch import nn
+from esm.esmfold.v1.misc import (
+    Attention,
+    Dropout,
+    PairToSequence,
+    ResidueMLP,
+    SequenceToPair,
+)
+class TriangularSelfAttentionBlock(nn.Module):
+    def __init__(
+        self,
+        sequence_state_dim,
+        pairwise_state_dim,
+        sequence_head_width,
+        pairwise_head_width,
+        dropout=0,
+        **__kwargs,
+    ):
+        super().__init__()
+        assert sequence_state_dim % sequence_head_width == 0
+        assert pairwise_state_dim % pairwise_head_width == 0
+        sequence_num_heads = sequence_state_dim // sequence_head_width
+        pairwise_num_heads = pairwise_state_dim // pairwise_head_width
+        assert sequence_state_dim == sequence_num_heads * sequence_head_width
+        assert pairwise_state_dim == pairwise_num_heads * pairwise_head_width
+        assert pairwise_state_dim % 2 == 0
+        self.sequence_state_dim = sequence_state_dim
+        self.pairwise_state_dim = pairwise_state_dim
+        self.layernorm_1 = nn.LayerNorm(sequence_state_dim)
+        self.sequence_to_pair = SequenceToPair(
+            sequence_state_dim, pairwise_state_dim // 2, pairwise_state_dim
+        )
+        self.pair_to_sequence = PairToSequence(pairwise_state_dim, sequence_num_heads)
+        self.seq_attention = Attention(
+            sequence_state_dim, sequence_num_heads, sequence_head_width, gated=True
+        )
+        self.tri_mul_out = TriangleMultiplicationOutgoing(
+            pairwise_state_dim,
+            pairwise_state_dim,
+        )
+        self.tri_mul_in = TriangleMultiplicationIncoming(
+            pairwise_state_dim,
+            pairwise_state_dim,
+        )
+        self.tri_att_start = TriangleAttentionStartingNode(
+            pairwise_state_dim,
+            pairwise_head_width,
+            pairwise_num_heads,
+            inf=1e9,
+        )  # type: ignore
+        self.tri_att_end = TriangleAttentionEndingNode(
+            pairwise_state_dim,
+            pairwise_head_width,
+            pairwise_num_heads,
+            inf=1e9,
+        )  # type: ignore
+        self.mlp_seq = ResidueMLP(sequence_state_dim, 4 * sequence_state_dim, dropout=dropout)
+        self.mlp_pair = ResidueMLP(pairwise_state_dim, 4 * pairwise_state_dim, dropout=dropout)
+        assert dropout < 0.4
+        self.drop = nn.Dropout(dropout)
+        self.row_drop = Dropout(dropout * 2, 2)
+        self.col_drop = Dropout(dropout * 2, 1)
+        torch.nn.init.zeros_(self.tri_mul_in.linear_z.weight)
+        torch.nn.init.zeros_(self.tri_mul_in.linear_z.bias)
+        torch.nn.init.zeros_(self.tri_mul_out.linear_z.weight)
+        torch.nn.init.zeros_(self.tri_mul_out.linear_z.bias)
+        torch.nn.init.zeros_(self.tri_att_start.mha.linear_o.weight)
+        torch.nn.init.zeros_(self.tri_att_start.mha.linear_o.bias)
+        torch.nn.init.zeros_(self.tri_att_end.mha.linear_o.weight)
+        torch.nn.init.zeros_(self.tri_att_end.mha.linear_o.bias)
+        torch.nn.init.zeros_(self.sequence_to_pair.o_proj.weight)
+        torch.nn.init.zeros_(self.sequence_to_pair.o_proj.bias)
+        torch.nn.init.zeros_(self.pair_to_sequence.linear.weight)
+        torch.nn.init.zeros_(self.seq_attention.o_proj.weight)
+        torch.nn.init.zeros_(self.seq_attention.o_proj.bias)
+        torch.nn.init.zeros_(self.mlp_seq.mlp[-2].weight)
+        torch.nn.init.zeros_(self.mlp_seq.mlp[-2].bias)
+        torch.nn.init.zeros_(self.mlp_pair.mlp[-2].weight)
+        torch.nn.init.zeros_(self.mlp_pair.mlp[-2].bias)
+    def forward(self, sequence_state, pairwise_state, mask=None, chunk_size=None, **__kwargs):
+        """
+        Inputs:
+          sequence_state: B x L x sequence_state_dim
+          pairwise_state: B x L x L x pairwise_state_dim
+          mask: B x L boolean tensor of valid positions
+        Output:
+          sequence_state: B x L x sequence_state_dim
+          pairwise_state: B x L x L x pairwise_state_dim
+        """
+        assert len(sequence_state.shape) == 3
+        assert len(pairwise_state.shape) == 4
+        if mask is not None:
+            assert len(mask.shape) == 2
+        batch_dim, seq_dim, sequence_state_dim = sequence_state.shape
+        pairwise_state_dim = pairwise_state.shape[3]
+        assert sequence_state_dim == self.sequence_state_dim
+        assert pairwise_state_dim == self.pairwise_state_dim
+        assert batch_dim == pairwise_state.shape[0]
+        assert seq_dim == pairwise_state.shape[1]
+        assert seq_dim == pairwise_state.shape[2]
+        # Update sequence state
+        bias = self.pair_to_sequence(pairwise_state)
+        # Self attention with bias + mlp.
+        y = self.layernorm_1(sequence_state)
+        y, _ = self.seq_attention(y, mask=mask, bias=bias)
+        sequence_state = sequence_state + self.drop(y)
+        sequence_state = self.mlp_seq(sequence_state)
+        # Update pairwise state
+        pairwise_state = pairwise_state + self.sequence_to_pair(sequence_state)
+        # Axial attention with triangular bias.
+        tri_mask = mask.unsqueeze(2) * mask.unsqueeze(1) if mask is not None else None
+        pairwise_state = pairwise_state + self.row_drop(
+            self.tri_mul_out(pairwise_state, mask=tri_mask)
+        )
+        pairwise_state = pairwise_state + self.col_drop(
+            self.tri_mul_in(pairwise_state, mask=tri_mask)
+        )
+        pairwise_state = pairwise_state + self.row_drop(
+            self.tri_att_start(pairwise_state, mask=tri_mask, chunk_size=chunk_size)
+        )
+        pairwise_state = pairwise_state + self.col_drop(
+            self.tri_att_end(pairwise_state, mask=tri_mask, chunk_size=chunk_size)
+        )
+        # MLP over pairs.
+        pairwise_state = self.mlp_pair(pairwise_state)
+        return sequence_state, pairwise_state

esm/esmfold/v1/trunk.py ADDED Viewed

	@@ -0,0 +1,243 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import typing as T
+from contextlib import ExitStack
+from dataclasses import dataclass
+import torch
+import torch.nn as nn
+from openfold.model.structure_module import StructureModule
+from esm.esmfold.v1.tri_self_attn_block import TriangularSelfAttentionBlock
+@dataclass
+class StructureModuleConfig:
+    c_s: int = 384
+    c_z: int = 128
+    c_ipa: int = 16
+    c_resnet: int = 128
+    no_heads_ipa: int = 12
+    no_qk_points: int = 4
+    no_v_points: int = 8
+    dropout_rate: float = 0.1
+    no_blocks: int = 8
+    no_transition_layers: int = 1
+    no_resnet_blocks: int = 2
+    no_angles: int = 7
+    trans_scale_factor: int = 10
+    epsilon: float = 1e-8
+    inf: float = 1e5
+@dataclass
+class FoldingTrunkConfig:
+    _name: str = "FoldingTrunkConfig"
+    num_blocks: int = 48
+    sequence_state_dim: int = 1024
+    pairwise_state_dim: int = 128
+    sequence_head_width: int = 32
+    pairwise_head_width: int = 32
+    position_bins: int = 32
+    dropout: float = 0
+    layer_drop: float = 0
+    cpu_grad_checkpoint: bool = False
+    max_recycles: int = 4
+    chunk_size: T.Optional[int] = None
+    structure_module: StructureModuleConfig = StructureModuleConfig()
+def get_axial_mask(mask):
+    """
+    Helper to convert B x L mask of valid positions to axial mask used
+    in row column attentions.
+    Input:
+      mask: B x L tensor of booleans
+    Output:
+      mask: B x L x L tensor of booleans
+    """
+    if mask is None:
+        return None
+    assert len(mask.shape) == 2
+    batch_dim, seq_dim = mask.shape
+    m = mask.unsqueeze(1).expand(batch_dim, seq_dim, seq_dim)
+    m = m.reshape(batch_dim * seq_dim, seq_dim)
+    return m
+class RelativePosition(nn.Module):
+    def __init__(self, bins, pairwise_state_dim):
+        super().__init__()
+        self.bins = bins
+        # Note an additional offset is used so that the 0th position
+        # is reserved for masked pairs.
+        self.embedding = torch.nn.Embedding(2 * bins + 2, pairwise_state_dim)
+    def forward(self, residue_index, mask=None):
+        """
+        Input:
+          residue_index: B x L tensor of indices (dytpe=torch.long)
+          mask: B x L tensor of booleans
+        Output:
+          pairwise_state: B x L x L x pairwise_state_dim tensor of embeddings
+        """
+        assert residue_index.dtype == torch.long
+        if mask is not None:
+            assert residue_index.shape == mask.shape
+        diff = residue_index[:, None, :] - residue_index[:, :, None]
+        diff = diff.clamp(-self.bins, self.bins)
+        diff = diff + self.bins + 1  # Add 1 to adjust for padding index.
+        if mask is not None:
+            mask = mask[:, None, :] * mask[:, :, None]
+            diff[mask == False] = 0
+        output = self.embedding(diff)
+        return output
+class FoldingTrunk(nn.Module):
+    def __init__(self, **kwargs):
+        super().__init__()
+        self.cfg = FoldingTrunkConfig(**kwargs)
+        assert self.cfg.max_recycles > 0
+        c_s = self.cfg.sequence_state_dim
+        c_z = self.cfg.pairwise_state_dim
+        assert c_s % self.cfg.sequence_head_width == 0
+        assert c_z % self.cfg.pairwise_head_width == 0
+        block = TriangularSelfAttentionBlock
+        self.pairwise_positional_embedding = RelativePosition(self.cfg.position_bins, c_z)
+        self.blocks = nn.ModuleList(
+            [
+                block(
+                    sequence_state_dim=c_s,
+                    pairwise_state_dim=c_z,
+                    sequence_head_width=self.cfg.sequence_head_width,
+                    pairwise_head_width=self.cfg.pairwise_head_width,
+                    dropout=self.cfg.dropout,
+                )
+                for i in range(self.cfg.num_blocks)
+            ]
+        )
+        self.recycle_bins = 15
+        self.recycle_s_norm = nn.LayerNorm(c_s)
+        self.recycle_z_norm = nn.LayerNorm(c_z)
+        self.recycle_disto = nn.Embedding(self.recycle_bins, c_z)
+        self.recycle_disto.weight[0].detach().zero_()
+        self.structure_module = StructureModule(**self.cfg.structure_module)  # type: ignore
+        self.trunk2sm_s = nn.Linear(c_s, self.structure_module.c_s)
+        self.trunk2sm_z = nn.Linear(c_z, self.structure_module.c_z)
+        self.chunk_size = self.cfg.chunk_size
+    def set_chunk_size(self, chunk_size):
+        # This parameter means the axial attention will be computed
+        # in a chunked manner. This should make the memory used more or less O(L) instead of O(L^2).
+        # It's equivalent to running a for loop over chunks of the dimension we're iterative over,
+        # where the chunk_size is the size of the chunks, so 128 would mean to parse 128-lengthed chunks.
+        self.chunk_size = chunk_size
+    def forward(self, seq_feats, pair_feats, true_aa, residx, mask, no_recycles: T.Optional[int] = None):
+        """
+        Inputs:
+          seq_feats:     B x L x C            tensor of sequence features
+          pair_feats:    B x L x L x C        tensor of pair features
+          residx:        B x L                long tensor giving the position in the sequence
+          mask:          B x L                boolean tensor indicating valid residues
+        Output:
+          predicted_structure: B x L x (num_atoms_per_residue * 3) tensor wrapped in a Coordinates object
+        """
+        device = seq_feats.device
+        s_s_0 = seq_feats
+        s_z_0 = pair_feats
+        if no_recycles is None:
+            no_recycles = self.cfg.max_recycles
+        else:
+            assert no_recycles >= 0, "Number of recycles must not be negative."
+            no_recycles += 1  # First 'recycle' is just the standard forward pass through the model.
+        def trunk_iter(s, z, residx, mask):
+            z = z + self.pairwise_positional_embedding(residx, mask=mask)
+            for block in self.blocks:
+                s, z = block(s, z, mask=mask, residue_index=residx, chunk_size=self.chunk_size)
+            return s, z
+        s_s = s_s_0
+        s_z = s_z_0
+        recycle_s = torch.zeros_like(s_s)
+        recycle_z = torch.zeros_like(s_z)
+        recycle_bins = torch.zeros(*s_z.shape[:-1], device=device, dtype=torch.int64)
+        assert no_recycles > 0
+        for recycle_idx in range(no_recycles):
+            with ExitStack() if recycle_idx == no_recycles - 1 else torch.no_grad():
+                # === Recycling ===
+                recycle_s = self.recycle_s_norm(recycle_s.detach())
+                recycle_z = self.recycle_z_norm(recycle_z.detach())
+                recycle_z += self.recycle_disto(recycle_bins.detach())
+                s_s, s_z = trunk_iter(s_s_0 + recycle_s, s_z_0 + recycle_z, residx, mask)
+                # === Structure module ===
+                structure = self.structure_module(
+                    {"single": self.trunk2sm_s(s_s), "pair": self.trunk2sm_z(s_z)},
+                    true_aa,
+                    mask.float(),
+                )
+                recycle_s = s_s
+                recycle_z = s_z
+                # Distogram needs the N, CA, C coordinates, and bin constants same as alphafold.
+                recycle_bins = FoldingTrunk.distogram(
+                    structure["positions"][-1][:, :, :3],
+                    3.375,
+                    21.375,
+                    self.recycle_bins,
+                )
+        assert isinstance(structure, dict)  # type: ignore
+        structure["s_s"] = s_s
+        structure["s_z"] = s_z
+        return structure
+    @staticmethod
+    def distogram(coords, min_bin, max_bin, num_bins):
+        # Coords are [... L x 3 x 3], where it's [N, CA, C] x 3 coordinates.
+        boundaries = torch.linspace(
+            min_bin,
+            max_bin,
+            num_bins - 1,
+            device=coords.device,
+        )
+        boundaries = boundaries**2
+        N, CA, C = [x.squeeze(-2) for x in coords.chunk(3, dim=-2)]
+        # Infer CB coordinates.
+        b = CA - N
+        c = C - CA
+        a = b.cross(c, dim=-1)
+        CB = -0.58273431 * a + 0.56802827 * b - 0.54067466 * c + CA
+        dists = (CB[..., None, :, :] - CB[..., :, None, :]).pow(2).sum(dim=-1, keepdims=True)
+        bins = torch.sum(dists > boundaries, dim=-1)  # [..., L, L]
+        return bins

esm/inverse_folding/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+print("1")
+from . import gvp_transformer
+print("2")
+from . import util
+print("3")
+from . import multichain_util
+print("4")

esm/inverse_folding/features.py ADDED Viewed

	@@ -0,0 +1,356 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+#
+# Portions of this file were adapted from the open source code for the following
+# two papers:
+#
+#   Ingraham, J., Garg, V., Barzilay, R., & Jaakkola, T. (2019). Generative
+#   models for graph-based protein design. Advances in Neural Information
+#   Processing Systems, 32.
+#
+#   Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L., & Dror, R. (2020).
+#   Learning from Protein Structure with Geometric Vector Perceptrons. In
+#   International Conference on Learning Representations.
+#
+# MIT License
+#
+# Copyright (c) 2020 Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael Townshend, Ron Dror
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+#
+# ================================================================
+# The below license applies to the portions of the code (parts of
+# src/datasets.py and src/models.py) adapted from Ingraham, et al.
+# ================================================================
+#
+# MIT License
+#
+# Copyright (c) 2019 John Ingraham, Vikas Garg, Regina Barzilay, Tommi Jaakkola
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+print("features1")
+from .gvp_utils import flatten_graph
+print("features2")
+from .gvp_modules import GVP, LayerNorm
+print("features3")
+from .util import normalize, norm, nan_to_num, rbf
+print("features4")
+class GVPInputFeaturizer(nn.Module):
+    @staticmethod
+    def get_node_features(coords, coord_mask, with_coord_mask=True):
+        # scalar features
+        node_scalar_features = GVPInputFeaturizer._dihedrals(coords)
+        if with_coord_mask:
+            node_scalar_features = torch.cat([
+                node_scalar_features,
+                coord_mask.float().unsqueeze(-1)
+            ], dim=-1)
+        # vector features
+        X_ca = coords[:, :, 1]
+        orientations = GVPInputFeaturizer._orientations(X_ca)
+        sidechains = GVPInputFeaturizer._sidechains(coords)
+        node_vector_features = torch.cat([orientations, sidechains.unsqueeze(-2)], dim=-2)
+        return node_scalar_features, node_vector_features
+    @staticmethod
+    def _orientations(X):
+        forward = normalize(X[:, 1:] - X[:, :-1])
+        backward = normalize(X[:, :-1] - X[:, 1:])
+        forward = F.pad(forward, [0, 0, 0, 1])
+        backward = F.pad(backward, [0, 0, 1, 0])
+        return torch.cat([forward.unsqueeze(-2), backward.unsqueeze(-2)], -2)
+    @staticmethod
+    def _sidechains(X):
+        n, origin, c = X[:, :, 0], X[:, :, 1], X[:, :, 2]
+        c, n = normalize(c - origin), normalize(n - origin)
+        bisector = normalize(c + n)
+        perp = normalize(torch.cross(c, n, dim=-1))
+        vec = -bisector * math.sqrt(1 / 3) - perp * math.sqrt(2 / 3)
+        return vec
+    @staticmethod
+    def _dihedrals(X, eps=1e-7):
+        X = torch.flatten(X[:, :, :3], 1, 2)
+        bsz = X.shape[0]
+        dX = X[:, 1:] - X[:, :-1]
+        U = normalize(dX, dim=-1)
+        u_2 = U[:, :-2]
+        u_1 = U[:, 1:-1]
+        u_0 = U[:, 2:]
+        # Backbone normals
+        n_2 = normalize(torch.cross(u_2, u_1, dim=-1), dim=-1)
+        n_1 = normalize(torch.cross(u_1, u_0, dim=-1), dim=-1)
+        # Angle between normals
+        cosD = torch.sum(n_2 * n_1, -1)
+        cosD = torch.clamp(cosD, -1 + eps, 1 - eps)
+        D = torch.sign(torch.sum(u_2 * n_1, -1)) * torch.acos(cosD)
+        # This scheme will remove phi[0], psi[-1], omega[-1]
+        D = F.pad(D, [1, 2])
+        D = torch.reshape(D, [bsz, -1, 3])
+        # Lift angle representations to the circle
+        D_features = torch.cat([torch.cos(D), torch.sin(D)], -1)
+        return D_features
+    @staticmethod
+    def _positional_embeddings(edge_index,
+                               num_embeddings=None,
+                               num_positional_embeddings=16,
+                               period_range=[2, 1000]):
+        # From https://github.com/jingraham/neurips19-graph-protein-design
+        num_embeddings = num_embeddings or num_positional_embeddings
+        d = edge_index[0] - edge_index[1]
+        frequency = torch.exp(
+            torch.arange(0, num_embeddings, 2, dtype=torch.float32,
+                device=edge_index.device)
+            * -(np.log(10000.0) / num_embeddings)
+        )
+        angles = d.unsqueeze(-1) * frequency
+        E = torch.cat((torch.cos(angles), torch.sin(angles)), -1)
+        return E
+    @staticmethod
+    def _dist(X, coord_mask, padding_mask, top_k_neighbors, eps=1e-8):
+        """ Pairwise euclidean distances """
+        bsz, maxlen = X.size(0), X.size(1)
+        coord_mask_2D = torch.unsqueeze(coord_mask,1) * torch.unsqueeze(coord_mask,2)
+        residue_mask = ~padding_mask
+        residue_mask_2D = torch.unsqueeze(residue_mask,1) * torch.unsqueeze(residue_mask,2)
+        dX = torch.unsqueeze(X,1) - torch.unsqueeze(X,2)
+        D = coord_mask_2D * norm(dX, dim=-1)
+        # sorting preference: first those with coords, then among the residues that
+        # exist but are masked use distance in sequence as tie breaker, and then the
+        # residues that came from padding are last
+        seqpos = torch.arange(maxlen, device=X.device)
+        Dseq = torch.abs(seqpos.unsqueeze(1) - seqpos.unsqueeze(0)).repeat(bsz, 1, 1)
+        D_adjust = nan_to_num(D) + (~coord_mask_2D) * (1e8 + Dseq*1e6) + (
+            ~residue_mask_2D) * (1e10)
+        if top_k_neighbors == -1:
+            D_neighbors = D_adjust
+            E_idx = seqpos.repeat(
+                    *D_neighbors.shape[:-1], 1)
+        else:
+            # Identify k nearest neighbors (including self)
+            k = min(top_k_neighbors, X.size(1))
+            D_neighbors, E_idx = torch.topk(D_adjust, k, dim=-1, largest=False)
+        coord_mask_neighbors = (D_neighbors < 5e7)
+        residue_mask_neighbors = (D_neighbors < 5e9)
+        return D_neighbors, E_idx, coord_mask_neighbors, residue_mask_neighbors
+class Normalize(nn.Module):
+    def __init__(self, features, epsilon=1e-6):
+        super(Normalize, self).__init__()
+        self.gain = nn.Parameter(torch.ones(features))
+        self.bias = nn.Parameter(torch.zeros(features))
+        self.epsilon = epsilon
+    def forward(self, x, dim=-1):
+        mu = x.mean(dim, keepdim=True)
+        sigma = torch.sqrt(x.var(dim, keepdim=True) + self.epsilon)
+        gain = self.gain
+        bias = self.bias
+        # Reshape
+        if dim != -1:
+            shape = [1] * len(mu.size())
+            shape[dim] = self.gain.size()[0]
+            gain = gain.view(shape)
+            bias = bias.view(shape)
+        return gain * (x - mu) / (sigma + self.epsilon) + bias
+class DihedralFeatures(nn.Module):
+    def __init__(self, node_embed_dim):
+        """ Embed dihedral angle features. """
+        super(DihedralFeatures, self).__init__()
+        # 3 dihedral angles; sin and cos of each angle
+        node_in = 6
+        # Normalization and embedding
+        self.node_embedding = nn.Linear(node_in,  node_embed_dim, bias=True)
+        self.norm_nodes = Normalize(node_embed_dim)
+    def forward(self, X):
+        """ Featurize coordinates as an attributed graph """
+        V = self._dihedrals(X)
+        V = self.node_embedding(V)
+        V = self.norm_nodes(V)
+        return V
+    @staticmethod
+    def _dihedrals(X, eps=1e-7, return_angles=False):
+        # First 3 coordinates are N, CA, C
+        X = X[:,:,:3,:].reshape(X.shape[0], 3*X.shape[1], 3)
+        # Shifted slices of unit vectors
+        dX = X[:,1:,:] - X[:,:-1,:]
+        U = F.normalize(dX, dim=-1)
+        u_2 = U[:,:-2,:]
+        u_1 = U[:,1:-1,:]
+        u_0 = U[:,2:,:]
+        # Backbone normals
+        n_2 = F.normalize(torch.cross(u_2, u_1, dim=-1), dim=-1)
+        n_1 = F.normalize(torch.cross(u_1, u_0, dim=-1), dim=-1)
+        # Angle between normals
+        cosD = (n_2 * n_1).sum(-1)
+        cosD = torch.clamp(cosD, -1+eps, 1-eps)
+        D = torch.sign((u_2 * n_1).sum(-1)) * torch.acos(cosD)
+        # This scheme will remove phi[0], psi[-1], omega[-1]
+        D = F.pad(D, (1,2), 'constant', 0)
+        D = D.view((D.size(0), int(D.size(1)/3), 3))
+        phi, psi, omega = torch.unbind(D,-1)
+        if return_angles:
+            return phi, psi, omega
+        # Lift angle representations to the circle
+        D_features = torch.cat((torch.cos(D), torch.sin(D)), 2)
+        return D_features
+class GVPGraphEmbedding(GVPInputFeaturizer):
+    def __init__(self, args):
+        super().__init__()
+        self.top_k_neighbors = args.top_k_neighbors
+        self.num_positional_embeddings = 16
+        self.remove_edges_without_coords = True
+        node_input_dim = (7, 3)
+        edge_input_dim = (34, 1)
+        node_hidden_dim = (args.node_hidden_dim_scalar,
+                args.node_hidden_dim_vector)
+        edge_hidden_dim = (args.edge_hidden_dim_scalar,
+                args.edge_hidden_dim_vector)
+        self.embed_node = nn.Sequential(
+            GVP(node_input_dim, node_hidden_dim, activations=(None, None)),
+            LayerNorm(node_hidden_dim, eps=1e-4)
+        )
+        self.embed_edge = nn.Sequential(
+            GVP(edge_input_dim, edge_hidden_dim, activations=(None, None)),
+            LayerNorm(edge_hidden_dim, eps=1e-4)
+        )
+        self.embed_confidence = nn.Linear(16, args.node_hidden_dim_scalar)
+    def forward(self, coords, coord_mask, padding_mask, confidence):
+        with torch.no_grad():
+            node_features = self.get_node_features(coords, coord_mask)
+            edge_features, edge_index = self.get_edge_features(
+                coords, coord_mask, padding_mask)
+        node_embeddings_scalar, node_embeddings_vector = self.embed_node(node_features)
+        edge_embeddings = self.embed_edge(edge_features)
+        rbf_rep = rbf(confidence, 0., 1.)
+        node_embeddings = (
+            node_embeddings_scalar + self.embed_confidence(rbf_rep),
+            node_embeddings_vector
+        )
+        node_embeddings, edge_embeddings, edge_index = flatten_graph(
+            node_embeddings, edge_embeddings, edge_index)
+        return node_embeddings, edge_embeddings, edge_index
+    def get_edge_features(self, coords, coord_mask, padding_mask):
+        X_ca = coords[:, :, 1]
+        # Get distances to the top k neighbors
+        E_dist, E_idx, E_coord_mask, E_residue_mask = GVPInputFeaturizer._dist(
+                X_ca, coord_mask, padding_mask, self.top_k_neighbors)
+        # Flatten the graph to be batch size 1 for torch_geometric package
+        dest = E_idx
+        B, L, k = E_idx.shape[:3]
+        src = torch.arange(L, device=E_idx.device).view([1, L, 1]).expand(B, L, k)
+        # After flattening, [2, B, E]
+        edge_index = torch.stack([src, dest], dim=0).flatten(2, 3)
+        # After flattening, [B, E]
+        E_dist = E_dist.flatten(1, 2)
+        E_coord_mask = E_coord_mask.flatten(1, 2).unsqueeze(-1)
+        E_residue_mask = E_residue_mask.flatten(1, 2)
+        # Calculate relative positional embeddings and distance RBF
+        pos_embeddings = GVPInputFeaturizer._positional_embeddings(
+            edge_index,
+            num_positional_embeddings=self.num_positional_embeddings,
+        )
+        D_rbf = rbf(E_dist, 0., 20.)
+        # Calculate relative orientation
+        X_src = X_ca.unsqueeze(2).expand(-1, -1, k, -1).flatten(1, 2)
+        X_dest = torch.gather(
+            X_ca,
+            1,
+            edge_index[1, :, :].unsqueeze(-1).expand([B, L*k, 3])
+        )
+        coord_mask_src = coord_mask.unsqueeze(2).expand(-1, -1, k).flatten(1, 2)
+        coord_mask_dest = torch.gather(
+            coord_mask,
+            1,
+            edge_index[1, :, :].expand([B, L*k])
+        )
+        E_vectors = X_src - X_dest
+        # For the ones without coordinates, substitute in the average vector
+        E_vector_mean = torch.sum(E_vectors * E_coord_mask, dim=1,
+                keepdims=True) / torch.sum(E_coord_mask, dim=1, keepdims=True)
+        E_vectors = E_vectors * E_coord_mask + E_vector_mean * ~(E_coord_mask)
+        # Normalize and remove nans
+        edge_s = torch.cat([D_rbf, pos_embeddings], dim=-1)
+        edge_v = normalize(E_vectors).unsqueeze(-2)
+        edge_s, edge_v = map(nan_to_num, (edge_s, edge_v))
+        # Also add indications of whether the coordinates are present
+        edge_s = torch.cat([
+            edge_s,
+            (~coord_mask_src).float().unsqueeze(-1),
+            (~coord_mask_dest).float().unsqueeze(-1),
+        ], dim=-1)
+        edge_index[:, ~E_residue_mask] = -1
+        if self.remove_edges_without_coords:
+            edge_index[:, ~E_coord_mask.squeeze(-1)] = -1
+        return (edge_s, edge_v), edge_index.transpose(0, 1)

esm/inverse_folding/gvp_encoder.py ADDED Viewed

	@@ -0,0 +1,56 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+from argparse import Namespace
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .features import GVPGraphEmbedding
+from .gvp_modules import GVPConvLayer, LayerNorm
+from .gvp_utils import unflatten_graph
+class GVPEncoder(nn.Module):
+    def __init__(self, args):
+        super().__init__()
+        self.args = args
+        self.embed_graph = GVPGraphEmbedding(args)
+        node_hidden_dim = (args.node_hidden_dim_scalar,
+                args.node_hidden_dim_vector)
+        edge_hidden_dim = (args.edge_hidden_dim_scalar,
+                args.edge_hidden_dim_vector)
+        conv_activations = (F.relu, torch.sigmoid)
+        self.encoder_layers = nn.ModuleList(
+                GVPConvLayer(
+                    node_hidden_dim,
+                    edge_hidden_dim,
+                    drop_rate=args.dropout,
+                    vector_gate=True,
+                    attention_heads=0,
+                    n_message=3,
+                    conv_activations=conv_activations,
+                    n_edge_gvps=0,
+                    eps=1e-4,
+                    layernorm=True,
+                )
+            for i in range(args.num_encoder_layers)
+        )
+    def forward(self, coords, coord_mask, padding_mask, confidence):
+        node_embeddings, edge_embeddings, edge_index = self.embed_graph(
+                coords, coord_mask, padding_mask, confidence)
+        for i, layer in enumerate(self.encoder_layers):
+            node_embeddings, edge_embeddings = layer(node_embeddings,
+                    edge_index, edge_embeddings)
+        node_embeddings = unflatten_graph(node_embeddings, coords.shape[0])
+        return node_embeddings

esm/inverse_folding/gvp_modules.py ADDED Viewed

	@@ -0,0 +1,475 @@

+# Contents of this file are from the open source code for
+#
+#   Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L., & Dror, R. (2020).
+#   Learning from Protein Structure with Geometric Vector Perceptrons. In
+#   International Conference on Learning Representations.
+#
+# MIT License
+#
+# Copyright (c) 2020 Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael Townshend, Ron Dror
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+import typing as T
+import torch
+from torch import nn
+import torch.nn.functional as F
+print("gvp_module1")
+from torch_geometric.nn import MessagePassing
+print("gvp_module2")
+from torch_scatter import scatter_add, scatter
+def tuple_size(tp):
+    return tuple([0 if a is None else a.size() for a in tp])
+def tuple_sum(tp1, tp2):
+    s1, v1 = tp1
+    s2, v2 = tp2
+    if v2 is None and v2 is None:
+        return (s1 + s2, None)
+    return (s1 + s2, v1 + v2)
+def tuple_cat(*args, dim=-1):
+    '''
+    Concatenates any number of tuples (s, V) elementwise.
+    :param dim: dimension along which to concatenate when viewed
+                as the `dim` index for the scalar-channel tensors.
+                This means that `dim=-1` will be applied as
+                `dim=-2` for the vector-channel tensors.
+    '''
+    dim %= len(args[0][0].shape)
+    s_args, v_args = list(zip(*args))
+    return torch.cat(s_args, dim=dim), torch.cat(v_args, dim=dim)
+def tuple_index(x, idx):
+    '''
+    Indexes into a tuple (s, V) along the first dimension.
+    :param idx: any object which can be used to index into a `torch.Tensor`
+    '''
+    return x[0][idx], x[1][idx]
+def randn(n, dims, device="cpu"):
+    '''
+    Returns random tuples (s, V) drawn elementwise from a normal distribution.
+    :param n: number of data points
+    :param dims: tuple of dimensions (n_scalar, n_vector)
+    :return: (s, V) with s.shape = (n, n_scalar) and
+             V.shape = (n, n_vector, 3)
+    '''
+    return torch.randn(n, dims[0], device=device), \
+            torch.randn(n, dims[1], 3, device=device)
+def _norm_no_nan(x, axis=-1, keepdims=False, eps=1e-8, sqrt=True):
+    '''
+    L2 norm of tensor clamped above a minimum value `eps`.
+    :param sqrt: if `False`, returns the square of the L2 norm
+    '''
+    # clamp is slow
+    # out = torch.clamp(torch.sum(torch.square(x), axis, keepdims), min=eps)
+    out = torch.sum(torch.square(x), axis, keepdims) + eps
+    return torch.sqrt(out) if sqrt else out
+def _split(x, nv):
+    '''
+    Splits a merged representation of (s, V) back into a tuple.
+    Should be used only with `_merge(s, V)` and only if the tuple
+    representation cannot be used.
+    :param x: the `torch.Tensor` returned from `_merge`
+    :param nv: the number of vector channels in the input to `_merge`
+    '''
+    v = torch.reshape(x[..., -3*nv:], x.shape[:-1] + (nv, 3))
+    s = x[..., :-3*nv]
+    return s, v
+def _merge(s, v):
+    '''
+    Merges a tuple (s, V) into a single `torch.Tensor`, where the
+    vector channels are flattened and appended to the scalar channels.
+    Should be used only if the tuple representation cannot be used.
+    Use `_split(x, nv)` to reverse.
+    '''
+    v = torch.reshape(v, v.shape[:-2] + (3*v.shape[-2],))
+    return torch.cat([s, v], -1)
+class GVP(nn.Module):
+    '''
+    Geometric Vector Perceptron. See manuscript and README.md
+    for more details.
+    :param in_dims: tuple (n_scalar, n_vector)
+    :param out_dims: tuple (n_scalar, n_vector)
+    :param h_dim: intermediate number of vector channels, optional
+    :param activations: tuple of functions (scalar_act, vector_act)
+    :param tuple_io: whether to keep accepting tuple inputs and outputs when vi
+    or vo = 0
+    '''
+    def __init__(self, in_dims, out_dims, h_dim=None, vector_gate=False,
+                 activations=(F.relu, torch.sigmoid), tuple_io=True,
+                 eps=1e-8):
+        super(GVP, self).__init__()
+        self.si, self.vi = in_dims
+        self.so, self.vo = out_dims
+        self.tuple_io = tuple_io
+        if self.vi:
+            self.h_dim = h_dim or max(self.vi, self.vo)
+            self.wh = nn.Linear(self.vi, self.h_dim, bias=False)
+            self.ws = nn.Linear(self.h_dim + self.si, self.so)
+            if self.vo:
+                self.wv = nn.Linear(self.h_dim, self.vo, bias=False)
+                if vector_gate:
+                    self.wg = nn.Linear(self.so, self.vo)
+        else:
+            self.ws = nn.Linear(self.si, self.so)
+        self.vector_gate = vector_gate
+        self.scalar_act, self.vector_act = activations
+        self.eps = eps
+    def forward(self, x):
+        '''
+        :param x: tuple (s, V) of `torch.Tensor`,
+                  or (if vectors_in is 0), a single `torch.Tensor`
+        :return: tuple (s, V) of `torch.Tensor`,
+                 or (if vectors_out is 0), a single `torch.Tensor`
+        '''
+        if self.vi:
+            s, v = x
+            v = torch.transpose(v, -1, -2)
+            vh = self.wh(v)
+            vn = _norm_no_nan(vh, axis=-2, eps=self.eps)
+            s = self.ws(torch.cat([s, vn], -1))
+            if self.scalar_act:
+                s = self.scalar_act(s)
+            if self.vo:
+                v = self.wv(vh)
+                v = torch.transpose(v, -1, -2)
+                if self.vector_gate:
+                    g = self.wg(s).unsqueeze(-1)
+                else:
+                    g = _norm_no_nan(v, axis=-1, keepdims=True, eps=self.eps)
+                if self.vector_act:
+                    g = self.vector_act(g)
+                    v = v * g
+        else:
+            if self.tuple_io:
+                assert x[1] is None
+                x = x[0]
+            s = self.ws(x)
+            if self.scalar_act:
+                s = self.scalar_act(s)
+            if self.vo:
+                v = torch.zeros(list(s.shape)[:-1] + [self.vo, 3],
+                        device=s.device)
+        if self.vo:
+            return (s, v)
+        elif self.tuple_io:
+            return (s, None)
+        else:
+            return s
+class _VDropout(nn.Module):
+    '''
+    Vector channel dropout where the elements of each
+    vector channel are dropped together.
+    '''
+    def __init__(self, drop_rate):
+        super(_VDropout, self).__init__()
+        self.drop_rate = drop_rate
+    def forward(self, x):
+        '''
+        :param x: `torch.Tensor` corresponding to vector channels
+        '''
+        if x is None:
+            return None
+        device = x.device
+        if not self.training:
+            return x
+        mask = torch.bernoulli(
+            (1 - self.drop_rate) * torch.ones(x.shape[:-1], device=device)
+        ).unsqueeze(-1)
+        x = mask * x / (1 - self.drop_rate)
+        return x
+class Dropout(nn.Module):
+    '''
+    Combined dropout for tuples (s, V).
+    Takes tuples (s, V) as input and as output.
+    '''
+    def __init__(self, drop_rate):
+        super(Dropout, self).__init__()
+        self.sdropout = nn.Dropout(drop_rate)
+        self.vdropout = _VDropout(drop_rate)
+    def forward(self, x):
+        '''
+        :param x: tuple (s, V) of `torch.Tensor`,
+                  or single `torch.Tensor`
+                  (will be assumed to be scalar channels)
+        '''
+        if type(x) is torch.Tensor:
+            return self.sdropout(x)
+        s, v = x
+        return self.sdropout(s), self.vdropout(v)
+class LayerNorm(nn.Module):
+    '''
+    Combined LayerNorm for tuples (s, V).
+    Takes tuples (s, V) as input and as output.
+    '''
+    def __init__(self, dims, tuple_io=True, eps=1e-8):
+        super(LayerNorm, self).__init__()
+        self.tuple_io = tuple_io
+        self.s, self.v = dims
+        self.scalar_norm = nn.LayerNorm(self.s)
+        self.eps = eps
+    def forward(self, x):
+        '''
+        :param x: tuple (s, V) of `torch.Tensor`,
+                  or single `torch.Tensor`
+                  (will be assumed to be scalar channels)
+        '''
+        if not self.v:
+            if self.tuple_io:
+                return self.scalar_norm(x[0]), None
+            return self.scalar_norm(x)
+        s, v = x
+        vn = _norm_no_nan(v, axis=-1, keepdims=True, sqrt=False, eps=self.eps)
+        nonzero_mask = (vn > 2 * self.eps)
+        vn = torch.sum(vn * nonzero_mask, dim=-2, keepdim=True
+            ) / (self.eps + torch.sum(nonzero_mask, dim=-2, keepdim=True))
+        vn = torch.sqrt(vn + self.eps)
+        v = nonzero_mask * (v / vn)
+        return self.scalar_norm(s), v
+class GVPConv(MessagePassing):
+    '''
+    Graph convolution / message passing with Geometric Vector Perceptrons.
+    Takes in a graph with node and edge embeddings,
+    and returns new node embeddings.
+    This does NOT do residual updates and pointwise feedforward layers
+    ---see `GVPConvLayer`.
+    :param in_dims: input node embedding dimensions (n_scalar, n_vector)
+    :param out_dims: output node embedding dimensions (n_scalar, n_vector)
+    :param edge_dims: input edge embedding dimensions (n_scalar, n_vector)
+    :param n_layers: number of GVPs in the message function
+    :param module_list: preconstructed message function, overrides n_layers
+    :param aggr: should be "add" if some incoming edges are masked, as in
+                 a masked autoregressive decoder architecture
+    '''
+    def __init__(self, in_dims, out_dims, edge_dims, n_layers=3,
+            vector_gate=False, module_list=None, aggr="mean", eps=1e-8,
+            activations=(F.relu, torch.sigmoid)):
+        super(GVPConv, self).__init__(aggr=aggr)
+        self.eps = eps
+        self.si, self.vi = in_dims
+        self.so, self.vo = out_dims
+        self.se, self.ve = edge_dims
+        module_list = module_list or []
+        if not module_list:
+            if n_layers == 1:
+                module_list.append(
+                    GVP((2*self.si + self.se, 2*self.vi + self.ve),
+                        (self.so, self.vo), activations=(None, None)))
+            else:
+                module_list.append(
+                    GVP((2*self.si + self.se, 2*self.vi + self.ve), out_dims,
+                        vector_gate=vector_gate, activations=activations)
+                )
+                for i in range(n_layers - 2):
+                    module_list.append(GVP(out_dims, out_dims,
+                        vector_gate=vector_gate))
+                module_list.append(GVP(out_dims, out_dims,
+                                       activations=(None, None)))
+        self.message_func = nn.Sequential(*module_list)
+    def forward(self, x, edge_index, edge_attr):
+        '''
+        :param x: tuple (s, V) of `torch.Tensor`
+        :param edge_index: array of shape [2, n_edges]
+        :param edge_attr: tuple (s, V) of `torch.Tensor`
+        '''
+        x_s, x_v = x
+        message = self.propagate(edge_index,
+                    s=x_s, v=x_v.reshape(x_v.shape[0], 3*x_v.shape[1]),
+                    edge_attr=edge_attr)
+        return _split(message, self.vo)
+    def message(self, s_i, v_i, s_j, v_j, edge_attr):
+        v_j = v_j.view(v_j.shape[0], v_j.shape[1]//3, 3)
+        v_i = v_i.view(v_i.shape[0], v_i.shape[1]//3, 3)
+        message = tuple_cat((s_j, v_j), edge_attr, (s_i, v_i))
+        message = self.message_func(message)
+        return _merge(*message)
+class GVPConvLayer(nn.Module):
+    '''
+    Full graph convolution / message passing layer with
+    Geometric Vector Perceptrons. Residually updates node embeddings with
+    aggregated incoming messages, applies a pointwise feedforward
+    network to node embeddings, and returns updated node embeddings.
+    To only compute the aggregated messages, see `GVPConv`.
+    :param node_dims: node embedding dimensions (n_scalar, n_vector)
+    :param edge_dims: input edge embedding dimensions (n_scalar, n_vector)
+    :param n_message: number of GVPs to use in message function
+    :param n_feedforward: number of GVPs to use in feedforward function
+    :param drop_rate: drop probability in all dropout layers
+    :param autoregressive: if `True`, this `GVPConvLayer` will be used
+           with a different set of input node embeddings for messages
+           where src >= dst
+    '''
+    def __init__(self, node_dims, edge_dims, vector_gate=False,
+                 n_message=3, n_feedforward=2, drop_rate=.1,
+                 autoregressive=False, attention_heads=0,
+                 conv_activations=(F.relu, torch.sigmoid),
+                 n_edge_gvps=0, layernorm=True, eps=1e-8):
+        super(GVPConvLayer, self).__init__()
+        if attention_heads == 0:
+            self.conv = GVPConv(
+                    node_dims, node_dims, edge_dims, n_layers=n_message,
+                    vector_gate=vector_gate,
+                    aggr="add" if autoregressive else "mean",
+                    activations=conv_activations,
+                    eps=eps,
+            )
+        else:
+            raise NotImplementedError
+        if layernorm:
+            self.norm = nn.ModuleList([LayerNorm(node_dims, eps=eps) for _ in range(2)])
+        else:
+            self.norm = nn.ModuleList([nn.Identity() for _ in range(2)])
+        self.dropout = nn.ModuleList([Dropout(drop_rate) for _ in range(2)])
+        ff_func = []
+        if n_feedforward == 1:
+            ff_func.append(GVP(node_dims, node_dims, activations=(None, None)))
+        else:
+            hid_dims = 4*node_dims[0], 2*node_dims[1]
+            ff_func.append(GVP(node_dims, hid_dims, vector_gate=vector_gate))
+            for i in range(n_feedforward-2):
+                ff_func.append(GVP(hid_dims, hid_dims, vector_gate=vector_gate))
+            ff_func.append(GVP(hid_dims, node_dims, activations=(None, None)))
+        self.ff_func = nn.Sequential(*ff_func)
+        self.edge_message_func = None
+        if n_edge_gvps > 0:
+            si, vi = node_dims
+            se, ve = edge_dims
+            module_list = [
+                GVP((2*si + se, 2*vi + ve), edge_dims, vector_gate=vector_gate)
+            ]
+            for i in range(n_edge_gvps - 2):
+                module_list.append(GVP(edge_dims, edge_dims,
+                    vector_gate=vector_gate))
+            if n_edge_gvps > 1:
+                module_list.append(GVP(edge_dims, edge_dims,
+                    activations=(None, None)))
+            self.edge_message_func = nn.Sequential(*module_list)
+            if layernorm:
+                self.edge_norm = LayerNorm(edge_dims, eps=eps)
+            else:
+                self.edge_norm = nn.Identity()
+            self.edge_dropout = Dropout(drop_rate)
+    def forward(self, x, edge_index, edge_attr,
+                autoregressive_x=None, node_mask=None):
+        '''
+        :param x: tuple (s, V) of `torch.Tensor`
+        :param edge_index: array of shape [2, n_edges]
+        :param edge_attr: tuple (s, V) of `torch.Tensor`
+        :param autoregressive_x: tuple (s, V) of `torch.Tensor`.
+                If not `None`, will be used as srcqq node embeddings
+                for forming messages where src >= dst. The corrent node
+                embeddings `x` will still be the base of the update and the
+                pointwise feedforward.
+        :param node_mask: array of type `bool` to index into the first
+                dim of node embeddings (s, V). If not `None`, only
+                these nodes will be updated.
+        '''
+        if self.edge_message_func:
+            src, dst = edge_index
+            if autoregressive_x is None:
+                x_src = x[0][src], x[1][src]
+            else:
+                mask = (src < dst).unsqueeze(-1)
+                x_src = (
+                    torch.where(mask, x[0][src], autoregressive_x[0][src]),
+                    torch.where(mask.unsqueeze(-1), x[1][src],
+                        autoregressive_x[1][src])
+                )
+            x_dst = x[0][dst], x[1][dst]
+            x_edge = (
+                torch.cat([x_src[0], edge_attr[0], x_dst[0]], dim=-1),
+                torch.cat([x_src[1], edge_attr[1], x_dst[1]], dim=-2)
+            )
+            edge_attr_dh = self.edge_message_func(x_edge)
+            edge_attr = self.edge_norm(tuple_sum(edge_attr,
+                self.edge_dropout(edge_attr_dh)))
+        if autoregressive_x is not None:
+            src, dst = edge_index
+            mask = src < dst
+            edge_index_forward = edge_index[:, mask]
+            edge_index_backward = edge_index[:, ~mask]
+            edge_attr_forward = tuple_index(edge_attr, mask)
+            edge_attr_backward = tuple_index(edge_attr, ~mask)
+            dh = tuple_sum(
+                self.conv(x, edge_index_forward, edge_attr_forward),
+                self.conv(autoregressive_x, edge_index_backward, edge_attr_backward)
+            )
+            count = scatter_add(torch.ones_like(dst), dst,
+                        dim_size=dh[0].size(0)).clamp(min=1).unsqueeze(-1)
+            dh = dh[0] / count, dh[1] / count.unsqueeze(-1)
+        else:
+            dh = self.conv(x, edge_index, edge_attr)
+        if node_mask is not None:
+            x_ = x
+            x, dh = tuple_index(x, node_mask), tuple_index(dh, node_mask)
+        x = self.norm[0](tuple_sum(x, self.dropout[0](dh)))
+        dh = self.ff_func(x)
+        x = self.norm[1](tuple_sum(x, self.dropout[1](dh)))
+        if node_mask is not None:
+            x_[0][node_mask], x_[1][node_mask] = x[0], x[1]
+            x = x_
+        return x, edge_attr

esm/inverse_folding/gvp_transformer.py ADDED Viewed

	@@ -0,0 +1,144 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+# import argparse
+# from typing import Any, Dict, List, Optional, Tuple, NamedTuple
+import torch
+from torch import nn
+# from torch import Tensor
+import torch.nn.functional as F
+# from scipy.spatial import transform
+#
+# from esm.data import Alphabet
+# from .features import DihedralFeatures
+# from .gvp_encoder import GVPEncoder
+# from .gvp_utils import unflatten_graph
+print("gvp1_transformer")
+from .gvp_transformer_encoder import GVPTransformerEncoder
+print("gvp2_transformer")
+from .transformer_decoder import TransformerDecoder
+print("gvp3_transformer")
+from .util import rotate, CoordBatchConverter
+print("gvp4_transformer")
+class GVPTransformerModel(nn.Module):
+    """
+    GVP-Transformer inverse folding model.
+    Architecture: Geometric GVP-GNN as initial layers, followed by
+    sequence-to-sequence Transformer encoder and decoder.
+    """
+    def __init__(self, args, alphabet):
+        super().__init__()
+        encoder_embed_tokens = self.build_embedding(
+            args, alphabet, args.encoder_embed_dim,
+        )
+        decoder_embed_tokens = self.build_embedding(
+            args, alphabet, args.decoder_embed_dim,
+        )
+        encoder = self.build_encoder(args, alphabet, encoder_embed_tokens)
+        decoder = self.build_decoder(args, alphabet, decoder_embed_tokens)
+        self.args = args
+        self.encoder = encoder
+        self.decoder = decoder
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        encoder = GVPTransformerEncoder(args, src_dict, embed_tokens)
+        return encoder
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        decoder = TransformerDecoder(
+            args,
+            tgt_dict,
+            embed_tokens,
+        )
+        return decoder
+    @classmethod
+    def build_embedding(cls, args, dictionary, embed_dim):
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.padding_idx
+        emb = nn.Embedding(num_embeddings, embed_dim, padding_idx)
+        nn.init.normal_(emb.weight, mean=0, std=embed_dim ** -0.5)
+        nn.init.constant_(emb.weight[padding_idx], 0)
+        return emb
+    def forward(
+        self,
+        coords,
+        padding_mask,
+        confidence,
+        prev_output_tokens,
+        return_all_hiddens: bool = False,
+        features_only: bool = False,
+    ):
+        encoder_out = self.encoder(coords, padding_mask, confidence,
+            return_all_hiddens=return_all_hiddens)
+        logits, extra = self.decoder(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            features_only=features_only,
+            return_all_hiddens=return_all_hiddens,
+        )
+        return logits, extra
+    def sample(self, coords, partial_seq=None, temperature=1.0, confidence=None, device=None):
+        """
+        Samples sequences based on multinomial sampling (no beam search).
+        Args:
+            coords: L x 3 x 3 list representing one backbone
+            partial_seq: Optional, partial sequence with mask tokens if part of
+                the sequence is known
+            temperature: sampling temperature, use low temperature for higher
+                sequence recovery and high temperature for higher diversity
+            confidence: optional length L list of confidence scores for coordinates
+        """
+        L = len(coords)
+        # Convert to batch format
+        batch_converter = CoordBatchConverter(self.decoder.dictionary)
+        batch_coords, confidence, _, _, padding_mask = (
+            batch_converter([(coords, confidence, None)], device=device)
+        )
+        # Start with prepend token
+        mask_idx = self.decoder.dictionary.get_idx('<mask>')
+        sampled_tokens = torch.full((1, 1+L), mask_idx, dtype=int)
+        sampled_tokens[0, 0] = self.decoder.dictionary.get_idx('<cath>')
+        if partial_seq is not None:
+            for i, c in enumerate(partial_seq):
+                sampled_tokens[0, i+1] = self.decoder.dictionary.get_idx(c)
+        # Save incremental states for faster sampling
+        incremental_state = dict()
+        # Run encoder only once
+        encoder_out = self.encoder(batch_coords, padding_mask, confidence)
+        # Make sure all tensors are on the same device if a GPU is present
+        if device:
+            sampled_tokens = sampled_tokens.to(device)
+        # Decode one token at a time
+        for i in range(1, L+1):
+            logits, _ = self.decoder(
+                sampled_tokens[:, :i],
+                encoder_out,
+                incremental_state=incremental_state,
+            )
+            logits = logits[0].transpose(0, 1)
+            logits /= temperature
+            probs = F.softmax(logits, dim=-1)
+            if sampled_tokens[0, i] == mask_idx:
+                sampled_tokens[:, i] = torch.multinomial(probs, 1).squeeze(-1)
+        sampled_seq = sampled_tokens[0, 1:]
+        # Convert back to string via lookup
+        return ''.join([self.decoder.dictionary.get_tok(a) for a in sampled_seq]), encoder_out

esm/inverse_folding/gvp_transformer_encoder.py ADDED Viewed

	@@ -0,0 +1,189 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# Contents of this file were adapted from the open source fairseq repository.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import argparse
+import math
+from typing import Dict, List, Optional
+import torch
+import torch.nn as nn
+from torch import Tensor
+print("gvp1_transformer_encoder")
+from esm.modules import SinusoidalPositionalEmbedding
+print("gvp2_transformer_encoder")
+from .features import GVPInputFeaturizer, DihedralFeatures
+print("gvp3_transformer_encoder")
+from .gvp_encoder import GVPEncoder
+print("gvp4_transformer_encoder")
+from .transformer_layer import TransformerEncoderLayer
+print("gvp5_transformer_encoder")
+from .util import nan_to_num, get_rotation_frames, rotate, rbf
+print("gvp6_transformer_encoder")
+class GVPTransformerEncoder(nn.Module):
+    """
+    Transformer encoder consisting of *args.encoder.layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__()
+        self.args = args
+        self.dictionary = dictionary
+        self.dropout_module = nn.Dropout(args.dropout)
+        embed_dim = embed_tokens.embedding_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.embed_tokens = embed_tokens
+        self.embed_scale = math.sqrt(embed_dim)
+        self.embed_positions = SinusoidalPositionalEmbedding(
+            embed_dim,
+            self.padding_idx,
+        )
+        self.embed_gvp_input_features = nn.Linear(15, embed_dim)
+        self.embed_confidence = nn.Linear(16, embed_dim)
+        self.embed_dihedrals = DihedralFeatures(embed_dim)
+        gvp_args = argparse.Namespace()
+        for k, v in vars(args).items():
+            if k.startswith("gvp_"):
+                setattr(gvp_args, k[4:], v)
+        self.gvp_encoder = GVPEncoder(gvp_args)
+        gvp_out_dim = gvp_args.node_hidden_dim_scalar + (3 *
+                gvp_args.node_hidden_dim_vector)
+        self.embed_gvp_output = nn.Linear(gvp_out_dim, embed_dim)
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [self.build_encoder_layer(args) for i in range(args.encoder_layers)]
+        )
+        self.num_layers = len(self.layers)
+        self.layer_norm = nn.LayerNorm(embed_dim)
+    def build_encoder_layer(self, args):
+        return TransformerEncoderLayer(args)
+    def forward_embedding(self, coords, padding_mask, confidence):
+        """
+        Args:
+            coords: N, CA, C backbone coordinates in shape length x 3 (atoms) x 3
+            padding_mask: boolean Tensor (true for padding) of shape length
+            confidence: confidence scores between 0 and 1 of shape length
+        """
+        components = dict()
+        coord_mask = torch.all(torch.all(torch.isfinite(coords), dim=-1), dim=-1)
+        coords = nan_to_num(coords)
+        mask_tokens = (
+            padding_mask * self.dictionary.padding_idx +
+            ~padding_mask * self.dictionary.get_idx("<mask>")
+        )
+        components["tokens"] = self.embed_tokens(mask_tokens) * self.embed_scale
+        components["diherals"] = self.embed_dihedrals(coords)
+        # GVP encoder
+        gvp_out_scalars, gvp_out_vectors = self.gvp_encoder(coords,
+                coord_mask, padding_mask, confidence)
+        R = get_rotation_frames(coords)
+        # Rotate to local rotation frame for rotation-invariance
+        gvp_out_features = torch.cat([
+            gvp_out_scalars,
+            rotate(gvp_out_vectors, R.transpose(-2, -1)).flatten(-2, -1),
+        ], dim=-1)
+        components["gvp_out"] = self.embed_gvp_output(gvp_out_features)
+        components["confidence"] = self.embed_confidence(
+             rbf(confidence, 0., 1.))
+        # In addition to GVP encoder outputs, also directly embed GVP input node
+        # features to the Transformer
+        scalar_features, vector_features = GVPInputFeaturizer.get_node_features(
+            coords, coord_mask, with_coord_mask=False)
+        features = torch.cat([
+            scalar_features,
+            rotate(vector_features, R.transpose(-2, -1)).flatten(-2, -1),
+        ], dim=-1)
+        components["gvp_input_features"] = self.embed_gvp_input_features(features)
+        embed = sum(components.values())
+        # for k, v in components.items():
+        #     print(k, torch.mean(v, dim=(0,1)), torch.std(v, dim=(0,1)))
+        x = embed
+        x = x + self.embed_positions(mask_tokens)
+        x = self.dropout_module(x)
+        return x, components
+    def forward(
+        self,
+        coords,
+        encoder_padding_mask,
+        confidence,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            coords (Tensor): backbone coordinates
+                shape batch_size x num_residues x num_atoms (3 for N, CA, C) x 3
+            encoder_padding_mask (ByteTensor): the positions of
+                  padding elements of shape `(batch_size x num_residues)`
+            confidence (Tensor): the confidence score of shape (batch_size x
+                num_residues). The value is between 0. and 1. for each residue
+                coordinate, or -1. if no coordinate is given
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(num_residues, batch_size, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch_size, num_residues)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch_size, num_residues, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(num_residues, batch_size, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        x, encoder_embedding = self.forward_embedding(coords,
+                encoder_padding_mask, confidence)
+        # account for padding while computing the representation
+        x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        encoder_states = []
+        if return_all_hiddens:
+            encoder_states.append(x)
+        # encoder layers
+        for layer in self.layers:
+            x = layer(
+                x, encoder_padding_mask=encoder_padding_mask
+            )
+            if return_all_hiddens:
+                assert encoder_states is not None
+                encoder_states.append(x)
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_embedding": [encoder_embedding],  # dictionary
+            "encoder_states": encoder_states,  # List[T x B x C]
+        }

esm/inverse_folding/gvp_utils.py ADDED Viewed

	@@ -0,0 +1,68 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+def flatten_graph(node_embeddings, edge_embeddings, edge_index):
+    """
+    Flattens the graph into a batch size one (with disconnected subgraphs for
+    each example) to be compatible with pytorch-geometric package.
+    Args:
+        node_embeddings: node embeddings in tuple form (scalar, vector)
+                - scalar: shape batch size x nodes x node_embed_dim
+                - vector: shape batch size x nodes x node_embed_dim x 3
+        edge_embeddings: edge embeddings of in tuple form (scalar, vector)
+                - scalar: shape batch size x edges x edge_embed_dim
+                - vector: shape batch size x edges x edge_embed_dim x 3
+        edge_index: shape batch_size x 2 (source node and target node) x edges
+    Returns:
+        node_embeddings: node embeddings in tuple form (scalar, vector)
+                - scalar: shape batch total_nodes x node_embed_dim
+                - vector: shape batch total_nodes x node_embed_dim x 3
+        edge_embeddings: edge embeddings of in tuple form (scalar, vector)
+                - scalar: shape batch total_edges x edge_embed_dim
+                - vector: shape batch total_edges x edge_embed_dim x 3
+        edge_index: shape 2 x total_edges
+    """
+    x_s, x_v = node_embeddings
+    e_s, e_v = edge_embeddings
+    batch_size, N = x_s.shape[0], x_s.shape[1]
+    node_embeddings = (torch.flatten(x_s, 0, 1), torch.flatten(x_v, 0, 1))
+    edge_embeddings = (torch.flatten(e_s, 0, 1), torch.flatten(e_v, 0, 1))
+    edge_mask = torch.any(edge_index != -1, dim=1)
+    # Re-number the nodes by adding batch_idx * N to each batch
+    edge_index = edge_index + (torch.arange(batch_size, device=edge_index.device) *
+            N).unsqueeze(-1).unsqueeze(-1)
+    edge_index = edge_index.permute(1, 0, 2).flatten(1, 2)
+    edge_mask = edge_mask.flatten()
+    edge_index = edge_index[:, edge_mask]
+    edge_embeddings = (
+        edge_embeddings[0][edge_mask, :],
+        edge_embeddings[1][edge_mask, :]
+    )
+    return node_embeddings, edge_embeddings, edge_index
+def unflatten_graph(node_embeddings, batch_size):
+    """
+    Unflattens node embeddings.
+    Args:
+        node_embeddings: node embeddings in tuple form (scalar, vector)
+                - scalar: shape batch total_nodes x node_embed_dim
+                - vector: shape batch total_nodes x node_embed_dim x 3
+        batch_size: int
+    Returns:
+        node_embeddings: node embeddings in tuple form (scalar, vector)
+                - scalar: shape batch size x nodes x node_embed_dim
+                - vector: shape batch size x nodes x node_embed_dim x 3
+    """
+    x_s, x_v = node_embeddings
+    x_s = x_s.reshape(batch_size, -1, x_s.shape[1])
+    x_v = x_v.reshape(batch_size, -1, x_v.shape[1], x_v.shape[2])
+    return (x_s, x_v)

esm/inverse_folding/multichain_util.py ADDED Viewed

	@@ -0,0 +1,152 @@

+# # Copyright (c) Meta Platforms, Inc. and affiliates.
+# #
+# # This source code is licensed under the MIT license found in the
+# # LICENSE file in the root directory of this source tree.
+#
+# import biotite.structure
+# import numpy as np
+# import torch
+# from typing import Sequence, Tuple, List
+#
+# from esm.inverse_folding.util import (
+#     load_structure,
+#     extract_coords_from_structure,
+#     load_coords,
+#     get_sequence_loss,
+#     get_encoder_output,
+# )
+#
+#
+# def extract_coords_from_complex(structure: biotite.structure.AtomArray):
+#     """
+#     Args:
+#         structure: biotite AtomArray
+#     Returns:
+#         Tuple (coords_list, seq_list)
+#         - coords: Dictionary mapping chain ids to L x 3 x 3 array for N, CA, C
+#           coordinates representing the backbone of each chain
+#         - seqs: Dictionary mapping chain ids to native sequences of each chain
+#     """
+#     coords = {}
+#     seqs = {}
+#     all_chains = biotite.structure.get_chains(structure)
+#     for chain_id in all_chains:
+#         chain = structure[structure.chain_id == chain_id]
+#         coords[chain_id], seqs[chain_id] = extract_coords_from_structure(chain)
+#     return coords, seqs
+#
+#
+# def load_complex_coords(fpath, chains):
+#     """
+#     Args:
+#         fpath: filepath to either pdb or cif file
+#         chains: the chain ids (the order matters for autoregressive model)
+#     Returns:
+#         Tuple (coords_list, seq_list)
+#         - coords: Dictionary mapping chain ids to L x 3 x 3 array for N, CA, C
+#           coordinates representing the backbone of each chain
+#         - seqs: Dictionary mapping chain ids to native sequences of each chain
+#     """
+#     structure = load_structure(fpath, chains)
+#     return extract_coords_from_complex(structure)
+#
+#
+# def _concatenate_coords(coords, target_chain_id, padding_length=10):
+#     """
+#     Args:
+#         coords: Dictionary mapping chain ids to L x 3 x 3 array for N, CA, C
+#             coordinates representing the backbone of each chain
+#         target_chain_id: The chain id to sample sequences for
+#         padding_length: Length of padding between concatenated chains
+#     Returns:
+#         Tuple (coords, seq)
+#             - coords is an L x 3 x 3 array for N, CA, C coordinates, a
+#               concatenation of the chains with padding in between
+#             - seq is the extracted sequence, with padding tokens inserted
+#               between the concatenated chains
+#     """
+#     pad_coords = np.full((padding_length, 3, 3), np.nan, dtype=np.float32)
+#     # For best performance, put the target chain first in concatenation.
+#     coords_list = [coords[target_chain_id]]
+#     for chain_id in coords:
+#         if chain_id == target_chain_id:
+#             continue
+#         coords_list.append(pad_coords)
+#         coords_list.append(coords[chain_id])
+#     coords_concatenated = np.concatenate(coords_list, axis=0)
+#     return coords_concatenated
+#
+#
+# def sample_sequence_in_complex(model, coords, target_chain_id, temperature=1.,
+#         padding_length=10):
+#     """
+#     Samples sequence for one chain in a complex.
+#     Args:
+#         model: An instance of the GVPTransformer model
+#         coords: Dictionary mapping chain ids to L x 3 x 3 array for N, CA, C
+#             coordinates representing the backbone of each chain
+#         target_chain_id: The chain id to sample sequences for
+#         padding_length: padding length in between chains
+#     Returns:
+#         Sampled sequence for the target chain
+#     """
+#     target_chain_len = coords[target_chain_id].shape[0]
+#     all_coords = _concatenate_coords(coords, target_chain_id)
+#     device = next(model.parameters()).device
+#
+#     # Supply padding tokens for other chains to avoid unused sampling for speed
+#     padding_pattern = ['<pad>'] * all_coords.shape[0]
+#     for i in range(target_chain_len):
+#         padding_pattern[i] = '<mask>'
+#     sampled = model.sample(all_coords, partial_seq=padding_pattern,
+#             temperature=temperature, device=device)
+#     sampled = sampled[:target_chain_len]
+#     return sampled
+#
+#
+# def score_sequence_in_complex(model, alphabet, coords, target_chain_id,
+#         target_seq, padding_length=10):
+#     """
+#     Scores sequence for one chain in a complex.
+#     Args:
+#         model: An instance of the GVPTransformer model
+#         alphabet: Alphabet for the model
+#         coords: Dictionary mapping chain ids to L x 3 x 3 array for N, CA, C
+#             coordinates representing the backbone of each chain
+#         target_chain_id: The chain id to sample sequences for
+#         target_seq: Target sequence for the target chain for scoring.
+#         padding_length: padding length in between chains
+#     Returns:
+#         Tuple (ll_fullseq, ll_withcoord)
+#         - ll_fullseq: Average log-likelihood over the full target chain
+#         - ll_withcoord: Average log-likelihood in target chain excluding those
+#             residues without coordinates
+#     """
+#     all_coords = _concatenate_coords(coords, target_chain_id)
+#
+#     loss, target_padding_mask = get_sequence_loss(model, alphabet, all_coords,
+#             target_seq)
+#     ll_fullseq = -np.sum(loss * ~target_padding_mask) / np.sum(
+#             ~target_padding_mask)
+#
+#     # Also calculate average when excluding masked portions
+#     coord_mask = np.all(np.isfinite(coords[target_chain_id]), axis=(-1, -2))
+#     ll_withcoord = -np.sum(loss * coord_mask) / np.sum(coord_mask)
+#     return ll_fullseq, ll_withcoord
+#
+#
+# def get_encoder_output_for_complex(model, alphabet, coords, target_chain_id):
+#     """
+#     Args:
+#         model: An instance of the GVPTransformer model
+#         alphabet: Alphabet for the model
+#         coords: Dictionary mapping chain ids to L x 3 x 3 array for N, CA, C
+#             coordinates representing the backbone of each chain
+#         target_chain_id: The chain id to sample sequences for
+#     Returns:
+#         Dictionary mapping chain id to encoder output for each chain
+#     """
+#     all_coords = _concatenate_coords(coords, target_chain_id)
+#     all_rep = get_encoder_output(model, alphabet, all_coords)
+#     target_chain_len = coords[target_chain_id].shape[0]
+#     return all_rep[:target_chain_len]

esm/inverse_folding/transformer_decoder.py ADDED Viewed

	@@ -0,0 +1,228 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# Contents of this file were adapted from the open source fairseq repository.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import math
+from typing import Any, Dict, List, Optional
+import torch
+import torch.nn as nn
+from torch import Tensor
+from esm.modules import SinusoidalPositionalEmbedding
+from .transformer_layer import TransformerDecoderLayer
+def fill_with_neg_inf(t):
+    """FP16-compatible function that fills a tensor with -inf."""
+    return t.float().fill_(float("-inf")).type_as(t)
+class TransformerDecoder(nn.Module):
+    """
+    Transformer decoder consisting of *args.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+    ):
+        super().__init__()
+        self.args = args
+        self.dictionary = dictionary
+        self._future_mask = torch.empty(0)
+        self.dropout_module = nn.Dropout(args.dropout)
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = args.decoder_embed_dim
+        self.embed_dim = embed_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.embed_tokens = embed_tokens
+        self.embed_scale = math.sqrt(embed_dim)
+        self.project_in_dim = (
+            nn.Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = SinusoidalPositionalEmbedding(
+            embed_dim,
+            self.padding_idx,
+        )
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                self.build_decoder_layer(args)
+                for _ in range(args.decoder_layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+        self.layer_norm = nn.LayerNorm(embed_dim)
+        self.build_output_projection(args, dictionary)
+    def build_output_projection(self, args, dictionary):
+        self.output_projection = nn.Linear(
+            args.decoder_embed_dim, len(dictionary), bias=False
+        )
+        nn.init.normal_(
+            self.output_projection.weight, mean=0, std=args.decoder_embed_dim ** -0.5
+        )
+    def build_decoder_layer(self, args):
+        return TransformerDecoderLayer(args)
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+        )
+        if not features_only:
+            x = self.output_layer(x)
+        x = x.transpose(1, 2) # B x T x C -> B x C x T
+        return x, extra
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+        # embed positions
+        positions = self.embed_positions(
+            prev_output_tokens
+        )
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            positions = positions[:, -1:]
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+        x += positions
+        x = self.dropout_module(x)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        self_attn_padding_mask: Optional[Tensor] = None
+        if prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=False,
+                need_head_weights=False,
+            )
+            inner_states.append(x)
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+        # T x B x C -> B x C x T
+        x = x.transpose(0, 1)
+        return x, {"inner_states": inner_states}
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        return self.output_projection(features)
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]

esm/inverse_folding/transformer_layer.py ADDED Viewed

	@@ -0,0 +1,304 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# Contents of this file were adapted from the open source fairseq repository.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+from typing import Dict, List, Optional
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from esm.multihead_attention import MultiheadAttention
+from torch import Tensor
+class TransformerEncoderLayer(nn.Module):
+    """Encoder layer block.
+    `layernorm -> dropout -> add residual`
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+    def __init__(self, args):
+        super().__init__()
+        self.args = args
+        self.embed_dim = args.encoder_embed_dim
+        self.self_attn = self.build_self_attention(self.embed_dim, args)
+        self.self_attn_layer_norm = torch.nn.LayerNorm(self.embed_dim)
+        self.dropout_module = nn.Dropout(args.dropout)
+        self.activation_fn = F.relu
+        self.fc1 = self.build_fc1(
+            self.embed_dim,
+            args.encoder_ffn_embed_dim,
+        )
+        self.fc2 = self.build_fc2(
+            args.encoder_ffn_embed_dim,
+            self.embed_dim,
+        )
+        self.final_layer_norm = nn.LayerNorm(self.embed_dim)
+    def build_fc1(self, input_dim, output_dim):
+        return nn.Linear(input_dim, output_dim)
+    def build_fc2(self, input_dim, output_dim):
+        return nn.Linear(input_dim, output_dim)
+    def build_self_attention(self, embed_dim, args):
+        return MultiheadAttention(
+            embed_dim,
+            args.encoder_attention_heads,
+            dropout=args.attention_dropout,
+            self_attention=True,
+        )
+    def residual_connection(self, x, residual):
+        return residual + x
+    def forward(
+        self,
+        x,
+        encoder_padding_mask: Optional[Tensor],
+        attn_mask: Optional[Tensor] = None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, seq_len)` where padding elements are indicated by ``1``.
+            attn_mask (ByteTensor): binary tensor of shape `(tgt_len, src_len)`,
+                where `tgt_len` is the length of output and `src_len` is the
+                length of input, though here both are equal to `seq_len`.
+                `attn_mask[tgt_i, src_j] = 1` means that when calculating the
+                embedding for `tgt_i`, we exclude (mask out) `src_j`. This is
+                useful for strided self-attention.
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        # anything in original attn_mask = 1, becomes -1e8
+        # anything in original attn_mask = 0, becomes 0
+        # Note that we cannot use -inf here, because at some edge cases,
+        # the attention weight (before softmax) for some padded element in query
+        # will become -inf, which results in NaN in model parameters
+        if attn_mask is not None:
+            attn_mask = attn_mask.masked_fill(
+                attn_mask.to(torch.bool), -1e8 if x.dtype == torch.float32 else -1e4
+            )
+        residual = x
+        x = self.self_attn_layer_norm(x)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=encoder_padding_mask,
+            need_weights=False,
+            attn_mask=attn_mask,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        residual = x
+        x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        return x
+class TransformerDecoderLayer(nn.Module):
+    """Decoder layer block.
+    `layernorm -> dropout -> add residual`
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+    def __init__(
+        self, args, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False
+    ):
+        super().__init__()
+        self.embed_dim = args.decoder_embed_dim
+        self.dropout_module = nn.Dropout(args.dropout)
+        self.self_attn = self.build_self_attention(
+            self.embed_dim,
+            args,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+        )
+        self.nh = self.self_attn.num_heads
+        self.head_dim = self.self_attn.head_dim
+        self.activation_fn = F.relu
+        self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim)
+        if no_encoder_attn:
+            self.encoder_attn = None
+            self.encoder_attn_layer_norm = None
+        else:
+            self.encoder_attn = self.build_encoder_attention(self.embed_dim, args)
+            self.encoder_attn_layer_norm = nn.LayerNorm(self.embed_dim)
+        self.ffn_layernorm = (
+            LayerNorm(args.decoder_ffn_embed_dim)
+            if getattr(args, "scale_fc", False)
+            else None
+        )
+        self.w_resid = (
+            nn.Parameter(
+                torch.ones(
+                    self.embed_dim,
+                ),
+                requires_grad=True,
+            )
+            if getattr(args, "scale_resids", False)
+            else None
+        )
+        self.fc1 = self.build_fc1(
+            self.embed_dim,
+            args.decoder_ffn_embed_dim,
+        )
+        self.fc2 = self.build_fc2(
+            args.decoder_ffn_embed_dim,
+            self.embed_dim,
+        )
+        self.final_layer_norm = nn.LayerNorm(self.embed_dim)
+        self.need_attn = True
+    def build_fc1(self, input_dim, output_dim):
+        return nn.Linear(input_dim, output_dim)
+    def build_fc2(self, input_dim, output_dim):
+        return nn.Linear(input_dim, output_dim)
+    def build_self_attention(
+        self, embed_dim, args, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            args.decoder_attention_heads,
+            dropout=args.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=True,
+        )
+    def build_encoder_attention(self, embed_dim, args):
+        return MultiheadAttention(
+            embed_dim,
+            args.decoder_attention_heads,
+            kdim=args.encoder_embed_dim,
+            vdim=args.encoder_embed_dim,
+            dropout=args.attention_dropout,
+            encoder_decoder_attention=True,
+        )
+    def residual_connection(self, x, residual):
+        return residual + x
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+        residual = x
+        x = self.self_attn_layer_norm(x)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        y = x
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+        residual = x
+        x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        if self.ffn_layernorm is not None:
+            x = self.ffn_layernorm(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if self.w_resid is not None:
+            residual = torch.mul(self.w_resid, residual)
+        x = self.residual_connection(x, residual)
+        return x, attn, None

esm/inverse_folding/util.py ADDED Viewed

	@@ -0,0 +1,323 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import json
+import math
+import biotite.structure
+from biotite.structure.io import pdbx, pdb
+from biotite.structure.residues import get_residues
+from biotite.structure import filter_backbone
+from biotite.structure import get_chains
+from biotite.sequence import ProteinSequence
+import numpy as np
+# from scipy.spatial import transform
+# from scipy.stats import special_ortho_group
+import torch
+# import torch.nn as nn
+import torch.nn.functional as F
+# import torch.utils.data as data
+from typing import Sequence, Tuple, List
+from esm.data import BatchConverter
+def load_structure(fpath, chain=None):
+    """
+    Args:
+        fpath: filepath to either pdb or cif file
+        chain: the chain id or list of chain ids to load
+    Returns:
+        biotite.structure.AtomArray
+    """
+    if fpath.endswith('cif'):
+        with open(fpath) as fin:
+            pdbxf = pdbx.PDBxFile.read(fin)
+        structure = pdbx.get_structure(pdbxf, model=1)
+    elif fpath.endswith('pdb'):
+        with open(fpath) as fin:
+            pdbf = pdb.PDBFile.read(fin)
+        structure = pdb.get_structure(pdbf, model=1)
+    bbmask = filter_backbone(structure)
+    structure = structure[bbmask]
+    all_chains = get_chains(structure)
+    if len(all_chains) == 0:
+        raise ValueError('No chains found in the input file.')
+    if chain is None:
+        chain_ids = all_chains
+    elif isinstance(chain, list):
+        chain_ids = chain
+    else:
+        chain_ids = [chain]
+    for chain in chain_ids:
+        if chain not in all_chains:
+            raise ValueError(f'Chain {chain} not found in input file')
+    chain_filter = [a.chain_id in chain_ids for a in structure]
+    structure = structure[chain_filter]
+    return structure
+def extract_coords_from_structure(structure: biotite.structure.AtomArray):
+    """
+    Args:
+        structure: An instance of biotite AtomArray
+    Returns:
+        Tuple (coords, seq)
+            - coords is an L x 3 x 3 array for N, CA, C coordinates
+            - seq is the extracted sequence
+    """
+    coords = get_atom_coords_residuewise(["N", "CA", "C"], structure)
+    residue_identities = get_residues(structure)[1]
+    seq = ''.join([ProteinSequence.convert_letter_3to1(r) for r in residue_identities])
+    return coords, seq
+def load_coords(fpath, chain):
+    """
+    Args:
+        fpath: filepath to either pdb or cif file
+        chain: the chain id
+    Returns:
+        Tuple (coords, seq)
+            - coords is an L x 3 x 3 array for N, CA, C coordinates
+            - seq is the extracted sequence
+    """
+    structure = load_structure(fpath, chain)
+    return extract_coords_from_structure(structure)
+def get_atom_coords_residuewise(atoms: List[str], struct: biotite.structure.AtomArray):
+    """
+    Example for atoms argument: ["N", "CA", "C"]
+    """
+    def filterfn(s, axis=None):
+        filters = np.stack([s.atom_name == name for name in atoms], axis=1)
+        sum = filters.sum(0)
+        if not np.all(sum <= np.ones(filters.shape[1])):
+            raise RuntimeError("structure has multiple atoms with same name")
+        index = filters.argmax(0)
+        coords = s[index].coord
+        coords[sum == 0] = float("nan")
+        return coords
+    return biotite.structure.apply_residue_wise(struct, struct, filterfn)
+def get_sequence_loss(model, alphabet, coords, seq):
+    device = next(model.parameters()).device
+    batch_converter = CoordBatchConverter(alphabet)
+    batch = [(coords, None, seq)]
+    coords, confidence, strs, tokens, padding_mask = batch_converter(
+        batch, device=device)
+    prev_output_tokens = tokens[:, :-1].to(device)
+    target = tokens[:, 1:]
+    target_padding_mask = (target == alphabet.padding_idx)
+    logits, _ = model.forward(coords, padding_mask, confidence, prev_output_tokens)
+    loss = F.cross_entropy(logits, target, reduction='none')
+    loss = loss[0].cpu().detach().numpy()
+    target_padding_mask = target_padding_mask[0].cpu().numpy()
+    return loss, target_padding_mask
+def score_sequence(model, alphabet, coords, seq):
+    loss, target_padding_mask = get_sequence_loss(model, alphabet, coords, seq)
+    ll_fullseq = -np.sum(loss * ~target_padding_mask) / np.sum(~target_padding_mask)
+    # Also calculate average when excluding masked portions
+    coord_mask = np.all(np.isfinite(coords), axis=(-1, -2))
+    ll_withcoord = -np.sum(loss * coord_mask) / np.sum(coord_mask)
+    return ll_fullseq, ll_withcoord
+def get_encoder_output(model, alphabet, coords):
+    device = next(model.parameters()).device
+    batch_converter = CoordBatchConverter(alphabet)
+    batch = [(coords, None, seq)]
+    coords, confidence, strs, tokens, padding_mask = batch_converter(
+        batch, device=device)
+    encoder_out = model.encoder.forward(coords, padding_mask, confidence,
+            return_all_hiddens=False)
+    # remove beginning and end (bos and eos tokens)
+    return encoder_out['encoder_out'][0][1:-1, 0]
+def rotate(v, R):
+    """
+    Rotates a vector by a rotation matrix.
+    Args:
+        v: 3D vector, tensor of shape (length x batch_size x channels x 3)
+        R: rotation matrix, tensor of shape (length x batch_size x 3 x 3)
+    Returns:
+        Rotated version of v by rotation matrix R.
+    """
+    R = R.unsqueeze(-3)
+    v = v.unsqueeze(-1)
+    return torch.sum(v * R, dim=-2)
+def get_rotation_frames(coords):
+    """
+    Returns a local rotation frame defined by N, CA, C positions.
+    Args:
+        coords: coordinates, tensor of shape (batch_size x length x 3 x 3)
+        where the third dimension is in order of N, CA, C
+    Returns:
+        Local relative rotation frames in shape (batch_size x length x 3 x 3)
+    """
+    v1 = coords[:, :, 2] - coords[:, :, 1]
+    v2 = coords[:, :, 0] - coords[:, :, 1]
+    e1 = normalize(v1, dim=-1)
+    u2 = v2 - e1 * torch.sum(e1 * v2, dim=-1, keepdim=True)
+    e2 = normalize(u2, dim=-1)
+    e3 = torch.cross(e1, e2, dim=-1)
+    R = torch.stack([e1, e2, e3], dim=-2)
+    return R
+def nan_to_num(ts, val=0.0):
+    """
+    Replaces nans in tensor with a fixed value.
+    """
+    val = torch.tensor(val, dtype=ts.dtype, device=ts.device)
+    return torch.where(~torch.isfinite(ts), val, ts)
+def rbf(values, v_min, v_max, n_bins=16):
+    """
+    Returns RBF encodings in a new dimension at the end.
+    """
+    rbf_centers = torch.linspace(v_min, v_max, n_bins, device=values.device)
+    rbf_centers = rbf_centers.view([1] * len(values.shape) + [-1])
+    rbf_std = (v_max - v_min) / n_bins
+    v_expand = torch.unsqueeze(values, -1)
+    z = (values.unsqueeze(-1) - rbf_centers) / rbf_std
+    return torch.exp(-z ** 2)
+def norm(tensor, dim, eps=1e-8, keepdim=False):
+    """
+    Returns L2 norm along a dimension.
+    """
+    return torch.sqrt(
+            torch.sum(torch.square(tensor), dim=dim, keepdim=keepdim) + eps)
+def normalize(tensor, dim=-1):
+    """
+    Normalizes a tensor along a dimension after removing nans.
+    """
+    return nan_to_num(
+        torch.div(tensor, norm(tensor, dim=dim, keepdim=True))
+    )
+class CoordBatchConverter(BatchConverter):
+    def __call__(self, raw_batch: Sequence[Tuple[Sequence, str]], device=None):
+        """
+        Args:
+            raw_batch: List of tuples (coords, confidence, seq)
+            In each tuple,
+                coords: list of floats, shape L x 3 x 3
+                confidence: list of floats, shape L; or scalar float; or None
+                seq: string of length L
+        Returns:
+            coords: Tensor of shape batch_size x L x 3 x 3
+            confidence: Tensor of shape batch_size x L
+            strs: list of strings
+            tokens: LongTensor of shape batch_size x L
+            padding_mask: ByteTensor of shape batch_size x L
+        """
+        self.alphabet.cls_idx = self.alphabet.get_idx("<cath>")
+        batch = []
+        for coords, confidence, seq in raw_batch:
+            if confidence is None:
+                confidence = 1.
+            if isinstance(confidence, float) or isinstance(confidence, int):
+                confidence = [float(confidence)] * len(coords)
+            if seq is None:
+                seq = 'X' * len(coords)
+            batch.append(((coords, confidence), seq))
+        coords_and_confidence, strs, tokens = super().__call__(batch)
+        # pad beginning and end of each protein due to legacy reasons
+        coords = [
+            F.pad(torch.tensor(cd), (0, 0, 0, 0, 1, 1), value=np.inf)
+            for cd, _ in coords_and_confidence
+        ]
+        confidence = [
+            F.pad(torch.tensor(cf), (1, 1), value=-1.)
+            for _, cf in coords_and_confidence
+        ]
+        coords = self.collate_dense_tensors(coords, pad_v=np.nan)
+        confidence = self.collate_dense_tensors(confidence, pad_v=-1.)
+        if device is not None:
+            coords = coords.to(device)
+            confidence = confidence.to(device)
+            tokens = tokens.to(device)
+        padding_mask = torch.isnan(coords[:,:,0,0])
+        coord_mask = torch.isfinite(coords.sum(-2).sum(-1))
+        confidence = confidence * coord_mask + (-1.) * padding_mask
+        return coords, confidence, strs, tokens, padding_mask
+    def from_lists(self, coords_list, confidence_list=None, seq_list=None, device=None):
+        """
+        Args:
+            coords_list: list of length batch_size, each item is a list of
+            floats in shape L x 3 x 3 to describe a backbone
+            confidence_list: one of
+                - None, default to highest confidence
+                - list of length batch_size, each item is a scalar
+                - list of length batch_size, each item is a list of floats of
+                    length L to describe the confidence scores for the backbone
+                    with values between 0. and 1.
+            seq_list: either None or a list of strings
+        Returns:
+            coords: Tensor of shape batch_size x L x 3 x 3
+            confidence: Tensor of shape batch_size x L
+            strs: list of strings
+            tokens: LongTensor of shape batch_size x L
+            padding_mask: ByteTensor of shape batch_size x L
+        """
+        batch_size = len(coords_list)
+        if confidence_list is None:
+            confidence_list = [None] * batch_size
+        if seq_list is None:
+            seq_list = [None] * batch_size
+        raw_batch = zip(coords_list, confidence_list, seq_list)
+        return self.__call__(raw_batch, device)
+    @staticmethod
+    def collate_dense_tensors(samples, pad_v):
+        """
+        Takes a list of tensors with the following dimensions:
+            [(d_11,       ...,           d_1K),
+             (d_21,       ...,           d_2K),
+             ...,
+             (d_N1,       ...,           d_NK)]
+        and stack + pads them into a single tensor of:
+        (N, max_i=1,N { d_i1 }, ..., max_i=1,N {diK})
+        """
+        if len(samples) == 0:
+            return torch.Tensor()
+        if len(set(x.dim() for x in samples)) != 1:
+            raise RuntimeError(
+                f"Samples has varying dimensions: {[x.dim() for x in samples]}"
+            )
+        (device,) = tuple(set(x.device for x in samples))  # assumes all on same device
+        max_shape = [max(lst) for lst in zip(*[x.shape for x in samples])]
+        result = torch.empty(
+            len(samples), *max_shape, dtype=samples[0].dtype, device=device
+        )
+        result.fill_(pad_v)
+        for i in range(len(samples)):
+            result_i = result[i]
+            t = samples[i]
+            result_i[tuple(slice(0, k) for k in t.shape)] = t
+        return result

esm/model/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

esm/model/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (176 Bytes). View file

esm/model/__pycache__/esm1.cpython-310.pyc ADDED Viewed

Binary file (5.16 kB). View file

esm/model/__pycache__/esm2.cpython-310.pyc ADDED Viewed

Binary file (3.5 kB). View file