Hi. I fine-tuned the Wav2Vec2ForCTC
model using Common Voice’s Greek data, using the code below:
training_args_bl3 = TrainingArguments(
output_dir = ‘bl2-cv-HFTrainer-small-lr’,
group_by_length = True,
per_device_train_batch_size = 16, # batch size
gradient_accumulation_steps = 2,
evaluation_strategy = ‘steps’,
num_train_epochs = 60,
fp16 = True,
save_steps = 298,
eval_steps = 298,
logging_steps = 298,
learning_rate = 3e-4,
warmup_steps = 180,
save_total_limit = 1,
load_best_model_at_end = True,
metric_for_best_model = ‘wer’,
greater_is_better = False)trainer_bl3 = Trainer(
model = bl2,
data_collator = data_collator,
args = training_args_bl3,
compute_metrics = compute_metrics,
train_dataset = cv_train,
eval_dataset = cv_val,
tokenizer = processor.feature_extractor,
callbacks = [EarlyStoppingCallback(early_stopping_patience = 10)])trainer_bl3.train()
where the model bl2
is loaded like this:
bl2 = Wav2Vec2ForCTC.from_pretrained(
‘facebook/wav2vec2-large-xlsr-53’,
attention_dropout = 0.1,
hidden_dropout = 0.1,
feat_proj_dropout = 0.0,
mask_time_prob = 0.05,
layerdrop = 0.1,
gradient_checkpointing = True,
ctc_loss_reduction = ‘mean’,
pad_token_id = processor.tokenizer.pad_token_id,
vocab_size = len(processor.tokenizer),
cache_dir = ‘/mnt/twohdd/.cache’)
After the training is completed, I proceed to save the model like this:
trainer_bl3.save_model(‘FILE_NAME’)
and then load it as follows
test = Wav2Vec2ForCTC.from_pretrained(‘FILE_NAME’).to(‘cuda’)
After this, and because I noticed that the two models (bl2
-fine-tuned & test
-reloaded) yielded different error rates at my test set when I checked, I saw that all their parameters differ. I checked using a function posted in this post Check if models have same weights - PyTorch Forums, by aasharma90.
Am I doing something wrong? Can someone please help me, as it is very important to me to be able to load the exact same fine-tuned model at the future.
Thanks in advance.
EDIT: Just wanted to mention that their scores are very close, but not the same. Also I’d like to note that when making the test set prediction, I am setting both models to evaluation mode.
EDIT2: I just tried to save my model by doing bl2.save_pretrained('FILENAME')
, instead of using Trainer()
's save_model()
. After loading with the same code show above (from_pretrained()
), and comparing the models using the function I linked here, they match.
This is good in the sense that now the models match, however I dont understand why this is happening. As far as I know, Trainer()
's save_model()
uses the save_pretrained()
method. What’s going wrong here?