llm-conf

Running

muellerzr HF staff commited on May 21, 2024

Commit

264a231

•

1 Parent(s): a21cc08

Update

Files changed (2) hide show

index.html CHANGED Viewed

@@ -512,11 +512,11 @@
 <ul>
 <li>No distributed techniques at play</li>
 </ul></li>
-<li>DDP:
 <ul>
 <li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
 </ul></li>
-<li>FSDP &amp; DeepSpeed:
 <ul>
 <li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
 </ul></li>

 <ul>
 <li>No distributed techniques at play</li>
 </ul></li>
+<li>Distributed Data Parallelism (DDP):
 <ul>
 <li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
 </ul></li>
+<li>Fully Sharded Data Parallelism (FSDP) &amp; DeepSpeed (DS):
 <ul>
 <li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
 </ul></li>

llm_conf.qmd CHANGED Viewed

@@ -61,9 +61,9 @@ What can we do?
 * Single GPU:
   * No distributed techniques at play
-* DDP:
   * A full copy of the model exists on each device, but data is chunked between each GPU
-* FSDP & DeepSpeed:
   * Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs

 * Single GPU:
   * No distributed techniques at play
+* Distributed Data Parallelism (DDP):
   * A full copy of the model exists on each device, but data is chunked between each GPU
+* Fully Sharded Data Parallelism (FSDP) & DeepSpeed (DS):
   * Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs