Can't save the tensorflow model of nvidia/mit-b5

When I try to save the tensorflow model of nvidia/mit-b5 I get a very cryptic error, related to a failed transpose (permuting 4 dimensions but given a single number).

The model trains just fine, and also inference works after the model is trained. But it cannot save as a tensorflow saved_model. Note that it does save to H5 successfully (i.e. saved_model=False).

The reproduction is very easy - simply creating it and trying to save it brings this error (my original code has specific classes that I fine tune to, but the same error appears with any creation of the model). Here’s a snippet that fails:

from transformers import TFSegformerForSemanticSegmentation

model_checkpoint = “nvidia/mit-b5”

model = TFSegformerForSemanticSegmentation.from_pretrained(
model_checkpoint,
)

model.save_pretrained(‘/workspace/saved_model’, saved_model=True)

The error is the following:
File /usr/local/lib/python3.11/dist-packages/transformers/models/segformer/modeling_tf_segformer.py:878, in TFSegformerDecodeHead.call(self, encoder_hidden_states, training)
875 encoder_hidden_state = tf.reshape(encoder_hidden_state, (-1, height, width, channel_dim))
877 # unify channel dimension
→ 878 encoder_hidden_state = tf.transpose(encoder_hidden_state, perm=[0, 2, 3, 1])
879 height, width = shape_list(encoder_hidden_state)[1:3]
880 encoder_hidden_state = mlp(encoder_hidden_state)

ValueError: Dimension must be 0 but is 4 for ‘{{node transpose}} = Transpose[T=DT_STRING, Tperm=DT_INT32](transpose/a, transpose/perm)’ with input shapes: , [4].

Can anyone help me me find a workaround or can help me uncover the root cause?
I am using tensorflow 2.16.1.

Thanks in advance!

2 Likes

Same problem here. Did you find how to correct it ?

1 Like

Actually, I did!

I fixed it locally, because I don’t understand the code enough to create a universal fix. The fix depends on whether you use b5 or b0 for instance, because it is related to a signature that has a very particular size, which is different for each of these cases (while the class is used for both).

I fixed it a while ago, so hopefully I can recap all the gory details:

  1. I have a local edited version of modeling_tf_segformer.py
    I think I did two changes:
    1.1) Under the TFSegformerPreTrainedModel class, I commented out the definition of the property def input_signature(self):
    1.2) Under the TFSegformerDecodeHead class, I created such a property with this precise code:
    @property
    def input_signature(self):
    return (tf.TensorSpec(shape=(None, 64, 128, 128), dtype=tf.float32),
    tf.TensorSpec(shape=(None, 128, 64, 64), dtype=tf.float32),
    tf.TensorSpec(shape=(None, 320, 32, 32), dtype=tf.float32),
    tf.TensorSpec(shape=(None, 512, 16, 16), dtype=tf.float32),
    )

Just FYI, for B0 it’s:
@property
def input_signature(self):
return (tf.TensorSpec(shape=(None, 32, 128, 128), dtype=tf.float32),
tf.TensorSpec(shape=(None, 64, 64, 64), dtype=tf.float32),
tf.TensorSpec(shape=(None, 160, 32, 32), dtype=tf.float32),
tf.TensorSpec(shape=(None, 256, 16, 16), dtype=tf.float32),
)

  1. While this is enough to make it save, it wasn’t enough to make it serve under the tensorflow serving server (!!). For this, I had to do everything with an older version of tensorflow: tensorflow/tensorflow:2.8.4-gpu
    (I’m using docker - and this is the base I chose).

This shouldn’t be a problem, except huggingface depends on a small function that was only introduced in 16.1. For this, I locally edited modeling_tf_utils.py to add this function and I copied the function from [here](Copied over from here: tensorflow/tensorflow/python/keras/engine/data_adapter.py at master · tensorflow/tensorflow · GitHub).

This is the added function:
def unpack_x_y_sample_weight(data):
if not isinstance(data, tuple):
return (data, None, None)
elif len(data) == 1:
return (data[0], None, None)
elif len(data) == 2:
return (data[0], data[1], None)
elif len(data) == 3:
return (data[0], data[1], data[2])
else:
error_msg = ("Data is expected to be in format x, (x,), (x, y), "
“or (x, y, sample_weight), found: {}”).format(data)
raise ValueError(error_msg)

  1. 1+2 is enough, but if you also have the fun of using tensorflow-datasets for training, the older tensorflow version forces you to install an older version of protobuf or you get other obscure errors. I add this line to my docker file after installing tensorflow-datasets:
    RUN pip install --upgrade “protobuf<=3.20.1”

I hope I recapped all steps correctly, and hope it works for you. It’s such a messy ecosystem :confused:

1 Like

If you are encountering difficulties when attempting to save the nvidia/mit-b5 model in TensorFlow, it’s crucial to first verify whether the model is compatible with TensorFlow. The nvidia/mit-b5 model is likely designed for PyTorch, in which case it may require conversion to TensorFlow format. To achieve this, you can utilize the Hugging Face transformers library, which allows for the seamless conversion of models from PyTorch to TensorFlow.

You can load the model in PyTorch and save it in TensorFlow format with the following code:

from transformers import TFAutoModel

Load the model (initially in PyTorch)

model = TFAutoModel.from_pretrained(“nvidia/mit-b5”)

Save the model in TensorFlow format

model.save_pretrained(“path_to_save_directory”)

This approach will save the model in a TensorFlow-compatible format. Additionally, ensure that you have sufficient disk space, as large models can require substantial storage. If the issue persists, I recommend checking TensorFlow Hub for an official TensorFlow version of the model, or considering an alternative model compatible with TensorFlow. Should the problem continue, kindly share the specific error message you’re encountering, as it will aid in diagnosing the issue more effectively.

1 Like