How to achieve 4-bit quantization?

#6
by HUG-NAN - opened

Can you share the implementation of 4-bit quantization code?

for transformer just use his class with load_in_4bit = true. It will run any flux transformer. No need to do anything else.

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit = true)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

Do you mean that and is that correct?
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit = true)

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Keyword arguments {'load_in_4bit': True} are not expected by FluxPipeline and will be ignored.
Loading pipeline components...: 100%
 7/7 [00:43<00:00,  3.20s/it]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
Loading checkpoint shards: 100%
 2/2 [00:39<00:00, 19.41s/it]
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers

ValueError Traceback (most recent call last)
in <cell line: 5>()
3
4 pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
----> 5 pipe.enable_model_cpu_offload()
6 reset_device_map()
7 enable_model_cpu_offload()

/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py in enable_model_cpu_offload(self, gpu_id, device)
1005 is_pipeline_device_mapped = self.hf_device_map is not None and len(self.hf_device_map) > 1
1006 if is_pipeline_device_mapped:
-> 1007 raise ValueError(
1008 "It seems like you have activated a device mapping strategy on the pipeline so calling enable_model_cpu_offload() isn't allowed. You can call reset_device_map()first and then callenable_model_cpu_offload()`."
1009 )

ValueError: It seems like you have activated a device mapping strategy on the pipeline so calling enable_model_cpu_offload() isn't allowed. You can call reset_device_map()first and then callenable_model_cpu_offload()`.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
model = AutoModelForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

ValueError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import AutoTokenizer, AutoModelForCausalLM
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
4 model = AutoModelForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
1047 return CONFIG_MAPPING[pattern].from_dict(config_dict, **unused_kwargs)
1048
-> 1049 raise ValueError(
1050 f"Unrecognized model in {pretrained_model_name_or_path}. "
1051 f"Should have a model_type key in its {CONFIG_NAME}, or contain one of the following strings "

ValueError: Unrecognized model in black-forest-labs/FLUX.1-dev. Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, m...

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
model = GPTNeoForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")


OSError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import GPTNeoForCausalLM, GPT2Tokenizer
2
----> 3 tokenizer = GPT2Tokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
4 model = GPTNeoForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2012 # loaded directly from the GGUF file.
2013 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()) and not gguf_file:
-> 2014 raise EnvironmentError(
2015 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
2016 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "

OSError: Can't load tokenizer for 'black-forest-labs/FLUX.1-dev'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'black-forest-labs/FLUX.1-dev' is the correct path to a directory containing all relevant files for a GPT2Tokenizer tokenizer.

Read the model card.... import from model.py from his github... not huggingface

What do you mean can you write a working code because I tried with many changes in Colab T4 and it didn't work

I don't use colab. Here's the github link found on the model card though.... https://github.com/HighCWu/flux-4bit

Your first problem is this

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

This isn't gpt...it is flux. READ THE MODEL CARD.

Sign up or log in to comment