Update
Browse files- index.html +2 -2
- llm_conf.qmd +2 -2
index.html
CHANGED
@@ -512,11 +512,11 @@
|
|
512 |
<ul>
|
513 |
<li>No distributed techniques at play</li>
|
514 |
</ul></li>
|
515 |
-
<li>DDP:
|
516 |
<ul>
|
517 |
<li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
|
518 |
</ul></li>
|
519 |
-
<li>FSDP & DeepSpeed:
|
520 |
<ul>
|
521 |
<li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
|
522 |
</ul></li>
|
|
|
512 |
<ul>
|
513 |
<li>No distributed techniques at play</li>
|
514 |
</ul></li>
|
515 |
+
<li>Distributed Data Parallelism (DDP):
|
516 |
<ul>
|
517 |
<li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
|
518 |
</ul></li>
|
519 |
+
<li>Fully Sharded Data Parallelism (FSDP) & DeepSpeed (DS):
|
520 |
<ul>
|
521 |
<li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
|
522 |
</ul></li>
|
llm_conf.qmd
CHANGED
@@ -61,9 +61,9 @@ What can we do?
|
|
61 |
|
62 |
* Single GPU:
|
63 |
* No distributed techniques at play
|
64 |
-
* DDP:
|
65 |
* A full copy of the model exists on each device, but data is chunked between each GPU
|
66 |
-
* FSDP & DeepSpeed:
|
67 |
* Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs
|
68 |
|
69 |
|
|
|
61 |
|
62 |
* Single GPU:
|
63 |
* No distributed techniques at play
|
64 |
+
* Distributed Data Parallelism (DDP):
|
65 |
* A full copy of the model exists on each device, but data is chunked between each GPU
|
66 |
+
* Fully Sharded Data Parallelism (FSDP) & DeepSpeed (DS):
|
67 |
* Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs
|
68 |
|
69 |
|