Questions about steps with gradient accumulation

shelf · June 1, 2022, 7:47am

For example, I use batch size=64, max steps=1000, log, evaluate and save steps=100.

Question 1: I want to confirm that if I use gradient accumulation (batch size=32, gradient accumulation steps=2), I don’t need to change step args (1000 → 2000, 100 → 200). Will transformers take care of gradient_accumulation_steps * xxx_step totally? I found the gradient_accumulation_steps document and this discussion. But I didn’t find any exact example.

Question 2: How about warm-up? Do I just keep the original args config?

Question 3: Will the data be fed into model in the same order? (I don’t set seed with --seed, so the default seed 42 is used)

logvinata · July 19, 2023, 11:47am

Hi, @shelf! Did you find the answers?

Topic		Replies	Views
Question about Gradient Accumulation step in Trainer 🤗Transformers	2	2390	September 10, 2021
Is there a standard way to handle leftover batches when using gradient accumulation? Intermediate	1	606	November 22, 2021
Selecting batch_size and gradient_accumulation_steps when fine-tuning Models	1	1906	December 31, 2023
Batch size, gradient accumulation steps for Linear schedule Models	0	689	May 1, 2021
Batch size vs gradient accumulation Beginners	9	26657	November 28, 2024

Questions about steps with gradient accumulation

Related topics