I’m using a fine-tuned BartForConditionalGeneration
model and trying to generate tokens.
outputs = model.generate(input_ids, attention_mask=attention_mask, num_beams=3,
min_new_tokens=1500,
max_new_tokens=2500,
early_stopping=True)
However, I get an error that This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
CUDA also throws a RunTimeError
.
Why? Shouldn’t BART be able to arbitrarily long sequences autoregressively?