BART max_new_tokens in generate function

I’m using a fine-tuned BartForConditionalGeneration model and trying to generate tokens.

  outputs = model.generate(input_ids, attention_mask=attention_mask, num_beams=3, 
                           min_new_tokens=1500,
                           max_new_tokens=2500,
                           early_stopping=True)

However, I get an error that This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

CUDA also throws a RunTimeError.

Why? Shouldn’t BART be able to arbitrarily long sequences autoregressively?

Hey!

No, Bart cannot arbitrarily increase the max length since it has been trained with learned positional encodings, unlike the recent LLMs that use RoPE. In other words, Bart has a fixed size matrix that computes position embeddings.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.