Epsilon in RMS Norm
#2
by
nebiusserver
- opened
Hi
Can it be that rms_norm_eps
was wrongly set to 1e-5, while it should equal 1e-6? The value of 1e-5 looks suspicious, because Qwen2.5-7B was trained with the epsilon of 1e-6 and Qwen2.5 72B-Instruct (that should be a finetuned version of base model) has 1e-6