Question Answering
PEFT
English
medical
GaiaMiniMed / README.md
Tonic's picture
Update README.md
cb76719
metadata
license: mit
datasets:
  - keivalya/MedQuad-MedicalQnADataset
language:
  - en
library_name: peft
tags:
  - medical
pipeline_tag: question-answering

Model Card for GaiaMiniMed

This is a medical fine tuned model from the Falcon-7b-Instruction Base using 500 steps & 6 epochs with MedAware Dataset from keivalya

Check out a cool demo with chat memory here : pseudolab/GaiaFalconChat

Model Details

Model Description

  • Developed by: Tonic
  • Shared by : Tonic
  • Model type: Medical Fine-Tuned Conversational Falcon 7b (Instruct)
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model:tiiuae/falcon-7b-instruct

Model Sources

Uses

Use this model like you would use Falcon Instruct Models

Direct Use

This model is intended for educational purposes only , always consult a doctor for the best advice.

This model should perform better at medical QnA tasks in a conversational manner.

It is our hope that it will help improve patient outcomes and public health.

Downstream Use

Use this model next to others and have group conversations to produce diagnoses , public health advisory , and personal hygene improvements.

Out-of-Scope Use

This model is not meant as a decision support system in the wild, only for educational use.

Bias, Risks, and Limitations

{{ bias_risks_limitations | default("[More Information Needed]", true)}}

How to Get Started with the Model


# Gaia MiniMed 鈿曪笍馃 Quick Start

from transformers import AutoConfig, AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM, MistralForCausalLM
from peft import PeftModel, PeftConfig
import torch
import gradio as gr
import random
from textwrap import wrap

def wrap_text(text, width=90):
    lines = text.split('\n')
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]
    wrapped_text = '\n'.join(wrapped_lines)
    return wrapped_text

def multimodal_prompt(user_input, system_prompt):
    formatted_input = f"{{{{ {system_prompt} }}}}\nUser: {user_input}\nFalcon:"
    encodeds = tokenizer(formatted_input, return_tensors="pt", add_special_tokens=False)
    model_inputs = encodeds.to(device)
    output = peft_model.generate(
        **model_inputs,
        max_length=500,
        use_cache=True,
        early_stopping=False,
        bos_token_id=peft_model.config.bos_token_id,
        eos_token_id=peft_model.config.eos_token_id,
        pad_token_id=peft_model.config.eos_token_id,
        temperature=0.4,
        do_sample=True
    )
    response_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return response_text

device = "cuda" if torch.cuda.is_available() else "cpu"
base_model_id = "tiiuae/falcon-7b-instruct"
model_directory = "Tonic/GaiaMiniMed"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True, padding_side="left")
model_config = AutoConfig.from_pretrained(base_model_id)
peft_model = AutoModelForCausalLM.from_pretrained(model_directory, config=model_config)
peft_model = PeftModel.from_pretrained(peft_model, model_directory)

class ChatBot:
    def __init__(self, system_prompt="You are an expert medical analyst:"):
        self.system_prompt = system_prompt
        self.history = []

    def predict(self, user_input, system_prompt):
        formatted_input = f"{{{{ {self.system_prompt} }}}}\nUser: {user_input}\nFalcon:"
        input_ids = tokenizer.encode(formatted_input, return_tensors="pt", add_special_tokens=False)
        response = peft_model.generate(input_ids=input_ids, max_length=900, use_cache=False, early_stopping=False, bos_token_id=peft_model.config.bos_token_id, eos_token_id=peft_model.config.eos_token_id, pad_token_id=peft_model.config.eos_token_id, temperature=0.4, do_sample=True)
        response_text = tokenizer.decode(response[0], skip_special_tokens=True)
        self.history.append(formatted_input)
        self.history.append(response_text)
        return response_text

bot = ChatBot()

title = "馃憢馃徎Welcome to Tonic's GaiaMiniMed Chat馃殌"
description = "You can use this Space to test out the current model [(Tonic/GaiaMiniMed)](https://huggingface.co/Tonic/GaiaMiniMed) or duplicate this Space and use it locally or on 馃HuggingFace. [Join me on Discord to build together](https://discord.gg/VqTxc76K3u)."
examples = [["What is the proper treatment for buccal herpes?", "You are a medicine and public health expert, you will receive a question, answer the question, and provide a complete answer"]]

iface = gr.Interface(
    fn=bot.predict,
    title=title,
    description=description,
    examples=examples,
    inputs=["text", "text"], 
    outputs="text",
    theme="ParityError/Anime"
)

iface.launch()
  • See the code below for more advanced deployment , including a naive memory store and user controllable parameters:

# Gaia MiniMed鈿曪笍馃Falcon Chat

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig
import torch
import gradio as gr
import json
import os
import shutil
import requests

# Define the device
device = "cuda" if torch.cuda.is_available() else "cpu"
#Define variables 
temperature=0.4
max_new_tokens=240
top_p=0.92
repetition_penalty=1.7
max_length=2048

# Use model IDs as variables
base_model_id = "tiiuae/falcon-7b-instruct"
model_directory = "Tonic/GaiaMiniMed"

# Instantiate the Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'left'


# Load the GaiaMiniMed model with the specified configuration
# Load the Peft model with a specific configuration
# Specify the configuration class for the model
model_config = AutoConfig.from_pretrained(base_model_id)
# Load the PEFT model with the specified configuration
peft_model = AutoModelForCausalLM.from_pretrained(model_directory, config=model_config)
peft_model = PeftModel.from_pretrained(peft_model, model_directory)



# Class to encapsulate the Falcon chatbot
class FalconChatBot:
    def __init__(self, system_prompt="You are an expert medical analyst:"):
        self.system_prompt = system_prompt

    def process_history(self, history):
        if history is None:
            return []
        
        # Ensure that history is a list of dictionaries
        if not isinstance(history, list):
            return []
        
        # Filter out special commands from the history
        filtered_history = []
        for message in history:
            if isinstance(message, dict):
                user_message = message.get("user", "")
                assistant_message = message.get("assistant", "")
                # Check if the user_message is not a special command
                if not user_message.startswith("Falcon:"):
                    filtered_history.append({"user": user_message, "assistant": assistant_message})
        return filtered_history

    def predict(self, user_message, assistant_message, history, temperature=0.4, max_new_tokens=700, top_p=0.99, repetition_penalty=1.9):

        # Process the history to remove special commands
        processed_history = self.process_history(history)
        # Combine the user and assistant messages into a conversation
        conversation = f"{self.system_prompt}\nFalcon: {assistant_message if assistant_message else ''} User: {user_message}\nFalcon:\n"
        # Encode the conversation using the tokenizer
        input_ids = tokenizer.encode(conversation, return_tensors="pt", add_special_tokens=False)
        # Generate a response using the Falcon model
        response = peft_model.generate(input_ids=input_ids, max_length=max_length, use_cache=False, early_stopping=False, bos_token_id=peft_model.config.bos_token_id, eos_token_id=peft_model.config.eos_token_id, pad_token_id=peft_model.config.eos_token_id, temperature=0.4, do_sample=True)
        # Decode the generated response to text
        response_text = tokenizer.decode(response[0], skip_special_tokens=True)
        # Append the Falcon-like conversation to the history
        self.history.append(conversation)
        self.history.append(response_text)
         
        return response_text


# Create the Falcon chatbot instance
falcon_bot = FalconChatBot()

# Define the Gradio interface
title = "馃憢馃徎Welcome to Tonic's 馃Falcon's Medical馃懆馃徎鈥嶁殨锔廍xpert Chat馃殌"
description = "You can use this Space to test out the GaiaMiniMed model [(Tonic/GaiaMiniMed)](https://huggingface.co/Tonic/GaiaMiniMed) or duplicate this Space and use it locally or on 馃HuggingFace. [Join me on Discord to build together](https://discord.gg/VqTxc76K3u). Please be patient as we "

history = [
    {"user": "hi there how can you help me?", "assistant": "Hello, my name is Gaia, i'm created by Tonic, i can answer questions about medicine and public health!"},
    # Add more user and assistant messages as needed
]
examples = [
    [
        {
            "user_message": "What is the proper treatment for buccal herpes?",
            "assistant_message": "My name is Gaia, I'm a health and sanitation expert ready to answer your medical questions.",
            "history": [],
            "temperature": 0.4,
            "max_new_tokens": 700,
            "top_p": 0.90,
            "repetition_penalty": 1.9,
        }
    ]
]





additional_inputs=[
    gr.Textbox("", label="Optional system prompt"),
    gr.Slider(
        label="Temperature",
        value=0.9,
        minimum=0.0,
        maximum=1.0,
        step=0.05,
        interactive=True,
        info="Higher values produce more diverse outputs",
    ),
    gr.Slider(
        label="Max new tokens",
        value=256,
        minimum=0,
        maximum=3000,
        step=64,
        interactive=True,
        info="The maximum numbers of new tokens",
    ),
    gr.Slider(
        label="Top-p (nucleus sampling)",
        value=0.90,
        minimum=0.01,
        maximum=0.99,
        step=0.05,
        interactive=True,
        info="Higher values sample more low-probability tokens",
    ),
    gr.Slider(
        label="Repetition penalty",
        value=1.2,
        minimum=1.0,
        maximum=2.0,
        step=0.05,
        interactive=True,
        info="Penalize repeated tokens",
    )
]

iface = gr.Interface(
    fn=falcon_bot.predict,
    title=title,
    description=description,
    examples=examples,
    inputs=[
        gr.inputs.Textbox(label="Input Parameters", type="text", lines=5),
    ] + additional_inputs,
    outputs="text",
    theme="ParityError/Anime"
)

# Launch the Gradio interface for the Falcon model
iface.launch()

Training Details

Results

image/png


TrainOutput(global_step=6150, training_loss=1.0597990553941183,
{'epoch': 6.0})

Training Data


DatasetDict({
    train: Dataset({
        features: ['qtype', 'Question', 'Answer'],
        num_rows: 16407
    })
})

Training Procedure

Preprocessing [optional]


trainable params: 4718592 || all params: 3613463424 || trainables%: 0.13058363808693696

Training Hyperparameters

  • Training regime: {{ training_regime | default("[More Information Needed]", true)}}

Speeds, Sizes, Times [optional]


metrics={'train_runtime': 30766.4612, 'train_samples_per_second': 3.2, 'train_steps_per_second': 0.2,
'total_flos': 1.1252790565109983e+18, 'train_loss': 1.0597990553941183,", true)}}

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: {{ hardware | default("[More Information Needed]", true)}}
  • Hours used: {{ hours_used | default("[More Information Needed]", true)}}
  • Cloud Provider: {{ cloud_provider | default("[More Information Needed]", true)}}
  • Compute Region: {{ cloud_region | default("[More Information Needed]", true)}}
  • Carbon Emitted: {{ co2_emitted | default("[More Information Needed]", true)}}

Technical Specifications

Model Architecture and Objective


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): FalconForCausalLM(
      (transformer): FalconModel(
        (word_embeddings): Embedding(65024, 4544)
        (h): ModuleList(
          (0-31): 32 x FalconDecoderLayer(
            (self_attention): FalconAttention(
              (maybe_rotary): FalconRotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=4544, out_features=4672, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4544, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4672, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(in_features=4544, out_features=4544, bias=False)
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): FalconMLP(
              (dense_h_to_4h): Linear4bit(in_features=4544, out_features=18176, bias=False)
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(in_features=18176, out_features=4544, bias=False)
            )
            (input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
          )
        )
        (ln_f): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=4544, out_features=65024, bias=False)
    )
  )
)

Compute Infrastructure

Google Collaboratory

Hardware

A100

Model Card Authors

Tonic

Model Card Contact

"Tonic