Image-Text-to-Text
PEFT
Safetensors
English
IDEFICS3_ROCO / README.md
eltorio's picture
Update README.md
909b8aa verified
metadata
license: apache-2.0
datasets:
  - eltorio/ROCO-radiology
language:
  - en
base_model:
  - HuggingFaceM4/Idefics3-8B-Llama3
pipeline_tag: image-text-to-text
library_name: peft

IDEFICS3_ROCO

StageLicenseContributors WelcomeOpen In Colab

Star the project

If you appreciate my work, please consider giving it a like! ๐Ÿคฉ
I'm also looking for donations of free GPU time to complete the fine-tuning process.
Please contact me if you can help! ๐Ÿ™

A Fine-tuned Radiology-focused Model based on Hugging Face's Idefics3 Model

This repository contains a fine-tuned version of the Hugging Face Idefics3-8B-Llama3 model, built on top of the Meta Llama 3.1 8B architecture. Our model, IDEFICS3_ROCO, has been fine-tuned on the Radiology Objects in Context (ROCO) dataset, a large-scale medical and multimodal imaging collection.

TL;DR

For immediate use, you can load the model directly from Hugging Face:

from transformers import AutoProcessor, Idefics3ForConditionalGeneration, image_utils
import torch
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # on CPU it requires โ‰ˆ 3h/query ๐Ÿ™ˆ
processor = AutoProcessor.from_pretrained(v)
model = Idefics3ForConditionalGeneration.from_pretrained(
        v, torch_dtype=torch.bfloat16
    ).to(device)

model.load_adapter("eltorio/IDEFICS3_ROCO")

Model Information

  • Base Model: Idefics3-8B-Llama3
  • Fine-tuning Dataset: Radiology Objects in Context (ROCO)
  • License: Apache-2.0
  • Current Status: Fine-tuning process is finished. Contributions to complete the fine-tuning / vallidation / test processes are welcome!

Training Progress Status

  • Current checkpoint: 12267 (100% completed)
  • Estimated remaining GPU time: 0 hours
  • Hardware requirements: T4 GPU with >16GB VRAM
  • Last update: november, 12th 2024

Fine-tuning Code

The fine-tuning code is available as a Jupyter Notebook in the ROCO-radiology dataset repository on Hugging Face:

The Junyper Notebook Open In Colab contains the code to fine-tune the Idefics3-8B-Llama3 model on the ROCO dataset. The fine-tuning process is currently halted at checkpoint 640 (out of 24,000) due to limitations with Colab Free T4 GPU unit. Contributions to complete the fine-tuning process are welcome!

Contributions Welcome

If you have the resources to complete the fine-tuning process, we would appreciate your contribution. Please fork this repository, finish the fine-tuning process, and submit a pull request with your updates.

Citation

If you use this model in your work, please cite the original Idefics3 model and our fine-tuned model:

Contribution Guide

  1. Technical Requirements

    • Access to powerful GPU (T4, V100, A100 or equivalent)
    • Python environment with PyTorch
    • Disk space: ~100GB
  2. Getting Started

  3. Contact

Docker Image

A AI training docker image is available for this model. The image and includes all necessary dependencies to run the fine-tuning process.
You need to set the HF_TOKEN environment variable to your Hugging Face API token.
You also need to have NVidia Docker container runtime installed. Finnaly, you need to run the container with GPU support with --gpus all option. The image is available on Docker Hub:

export HF_TOKEN=hf_some_token
docker run --gpus all --user=42420:42420 -e HF_TOKEN=$HF_TOKEN -it sctg/roco-idefics3:latest bash -i  /start.sh $HF_TOKEN

The Dockerfile is available in the IDEFICS_ROCO repository.

Use this model

According to the Apache license you should cite this model with:

@misc {ronan_l.m._2024,
    author       = { {Ronan L.M.} },
    title        = { IDEFICS3_ROCO (Revision b02598a) },
    year         = 2024,
    url          = { https://huggingface.co/eltorio/IDEFICS3_ROCO },
    doi          = { 10.57967/hf/3504 },
    publisher    = { Hugging Face }
}

Acknowledgments

This work was made possible by the Hugging Face Transformers library and the ROCO-radiology dataset.