--- library_name: diffusers --- # MGIE This repository contains the UNet and LLaVA model checkpoints from [Guiding Instruction-based Image Editing via Multimodal Large Language Models](https://arxiv.org/abs/2309.17102). For a detailed example of usage, refer to [this notebook](https://github.com/apple/ml-mgie/blob/main/demo.ipynb) and the [official repository](https://github.com/apple/ml-mgie). Additionally, this notebook is a memory-optimized version of the original one. This decouples the MGIE inference pipeline into two broad stages: 1. Calculate all the embeddings in a batched manner with the LLaVA model and the edit head. 2. Pop it off the memory to gain VRAM. 3. Loads the InstructPix2Pix pipeline and performs editing. 💡 MGIE needs additional set up steps that are important to follow before running inference. Please refer to the repository for those instructions. Importantly, it needs you to merge the LLaVA weight deltas with the original LLaMA parameters. More details are in the repository. ## Processing ultra high-resolution images Since the [InstructPi2xPi2x pipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix) doesn't do any internal processing to resize the input images, you might get OOMs when processing ultra high-resolution images like [this one](https://i.imgur.com/CiAbKbS.jpg). So, it's recommended to resize them, preserving their aspect-ratio. Here's a utility function that can be leveraged here: ```python from diffusers.utils import load_image def resize_image_aspect_ratio(img_url, base_width=None, base_height=None): # Load the image img = load_image(img_url).convert("RGB") # Get the current width and height of the image width, height = img.size # Calculate the new dimensions based on the aspect ratio if base_width is not None: # Calculate new height based on the base_width to maintain aspect ratio w_percent = (base_width / float(width)) h_size = int((float(height) * float(w_percent))) new_size = (base_width, h_size) elif base_height is not None: # Calculate new width based on the base_height to maintain aspect ratio h_percent = (base_height / float(height)) w_size = int((float(width) * float(h_percent))) new_size = (w_size, base_height) else: raise ValueError("Either base_width or base_height must be provided") # Resize the image resized_img = img.resize(new_size, Image.ANTIALIAS) return resized_img ``` ## Citation ``` @inproceedings{fu2024mgie, author = {Tsu-Jui Fu and Wenze Hu and Xianzhi Du and William Yang Wang and Yinfei Yang, and Zhe Gan},   title = {{Guiding Instruction-based Image Editing via Multimodal Large Language Models}},   booktitle = {International Conference on Learning Representations (ICLR)},   year = {2024} } ```