Number of elements allgather at a time. Limits the memory required for the allgather for large model sizes. Smaller values use less GPU memory, but slow down training and validating. |
Number of elements allgather at a time. Limits the memory required for the allgather for large model sizes. Smaller values use less GPU memory, but slow down training and validating. |