MiniMaxAI/MiniMax-Text-01 · Anyone suceeded in finetuning?

about 18 hours ago

I'm trying to finetune this model for my task, but Quanto quantization doesn't support finetuning.
I changed the quantization method to BitsandBytes and add llm_int8_skip_modules corresponding to modules_to_not_convert.

    bnb_config = BitsAndBytesConfig(
        load_in_8bit=True,
        llm_int8_skip_modules=["lm_head", "embed_tokens",]
                               + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
                               + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
    )
    # load bfloat16 model, move to device, and apply quantization
    model = AutoModelForCausalLM.from_pretrained(
        base_model_name,
        torch_dtype="bfloat16",
        device_map=device_map,
        quantization_config=bnb_config,
        trust_remote_code=True,
        offload_buffers=True,
    )

But I get TypeError: cannot unpack non-iterable NoneType object when computing self-attention during training.

  File "/root/.cache/huggingface/modules/transformers_modules/MiniMaxAI/MiniMax-Text-01/372fb1d2051619593bfc3b7ef553745615bbbd5d/modeling_minimax_text_01.py", line 1028, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
TypeError: cannot unpack non-iterable NoneType object

Some sample code for finetuning would be nice.

zzzsssqqq

MiniMax org about 8 hours ago

This comment has been hidden

MiniMax-AI

MiniMax org about 8 hours ago

Thank you for your feedback. Currently, we have only released the code for inference, not for training.