Color
Browse files- index.html +5 -5
- llm_conf.qmd +5 -5
index.html
CHANGED
@@ -426,7 +426,7 @@
|
|
426 |
<li>Backward ~= 2x the model size</li>
|
427 |
<li>The optimizer step ~= 4x the model size (1x model, 1x gradients, 2x optimizer):</li>
|
428 |
</ul>
|
429 |
-
<div style="font-size: 50%;background-color: rgba(0,0,0,.1);">
|
430 |
<table>
|
431 |
<thead>
|
432 |
<tr class="header">
|
@@ -465,7 +465,7 @@
|
|
465 |
<p>This works fine for small models, we have cards with anywhere from 12-24GB of GPU memory (on the GPU-poor side).</p>
|
466 |
<p>But what happens as we scale?</p>
|
467 |
<p>Here’s <code>llama-3-8B</code> (8.03B parameters)</p>
|
468 |
-
<div style="font-size: 50%;background-color: rgba(0,0,0,.1);">
|
469 |
<table>
|
470 |
<thead>
|
471 |
<tr class="header">
|
@@ -698,7 +698,7 @@
|
|
698 |
<li>Rely on <code>config.yaml</code> files</li>
|
699 |
<li>Choose to either running <code>accelerate config</code> or write your own:</li>
|
700 |
</ul>
|
701 |
-
<div class="columns" style="font-size: 50%;padding-left:10%;background-color: rgba(0,0,0,.1);">
|
702 |
<div class="column" style="width:40%;">
|
703 |
<div class="code-with-filename">
|
704 |
<div class="code-with-filename-file">
|
@@ -804,7 +804,7 @@
|
|
804 |
<ul>
|
805 |
<li>Let’s tie that back up to the model estimator with neat tools like NVIDIA’s TransformerEngine</li>
|
806 |
</ul>
|
807 |
-
<div style="font-size: 60%;background-color: rgba(0,0,0,.1);">
|
808 |
<table style="width:100%;">
|
809 |
<colgroup>
|
810 |
<col style="width: 14%">
|
@@ -894,7 +894,7 @@
|
|
894 |
<ul>
|
895 |
<li>Extremely similar, however mostly used different naming conventions for items and slight tweaks in the implementation</li>
|
896 |
</ul>
|
897 |
-
<div style="font-size: 50%;background-color: rgba(0,0,0,.1);">
|
898 |
<table style="width:100%;">
|
899 |
<colgroup>
|
900 |
<col style="width: 16%">
|
|
|
426 |
<li>Backward ~= 2x the model size</li>
|
427 |
<li>The optimizer step ~= 4x the model size (1x model, 1x gradients, 2x optimizer):</li>
|
428 |
</ul>
|
429 |
+
<div style="font-size: 50%;background-color: rgba(0,0,0,.1);color: #93a1a1;">
|
430 |
<table>
|
431 |
<thead>
|
432 |
<tr class="header">
|
|
|
465 |
<p>This works fine for small models, we have cards with anywhere from 12-24GB of GPU memory (on the GPU-poor side).</p>
|
466 |
<p>But what happens as we scale?</p>
|
467 |
<p>Here’s <code>llama-3-8B</code> (8.03B parameters)</p>
|
468 |
+
<div style="font-size: 50%;background-color: rgba(0,0,0,.1);color: #93a1a1;">
|
469 |
<table>
|
470 |
<thead>
|
471 |
<tr class="header">
|
|
|
698 |
<li>Rely on <code>config.yaml</code> files</li>
|
699 |
<li>Choose to either running <code>accelerate config</code> or write your own:</li>
|
700 |
</ul>
|
701 |
+
<div class="columns" style="font-size: 50%;padding-left:10%;background-color: rgba(0,0,0,.1);color: #93a1a1;">
|
702 |
<div class="column" style="width:40%;">
|
703 |
<div class="code-with-filename">
|
704 |
<div class="code-with-filename-file">
|
|
|
804 |
<ul>
|
805 |
<li>Let’s tie that back up to the model estimator with neat tools like NVIDIA’s TransformerEngine</li>
|
806 |
</ul>
|
807 |
+
<div style="font-size: 60%;background-color: rgba(0,0,0,.1);color: #93a1a1;">
|
808 |
<table style="width:100%;">
|
809 |
<colgroup>
|
810 |
<col style="width: 14%">
|
|
|
894 |
<ul>
|
895 |
<li>Extremely similar, however mostly used different naming conventions for items and slight tweaks in the implementation</li>
|
896 |
</ul>
|
897 |
+
<div style="font-size: 50%;background-color: rgba(0,0,0,.1);color: #93a1a1;">
|
898 |
<table style="width:100%;">
|
899 |
<colgroup>
|
900 |
<col style="width: 16%">
|
llm_conf.qmd
CHANGED
@@ -28,7 +28,7 @@ General estimate (`bert-base-cased`, 108M params):
|
|
28 |
- Backward ~= 2x the model size
|
29 |
- The optimizer step ~= 4x the model size (1x model, 1x gradients, 2x optimizer):
|
30 |
|
31 |
-
::: {style="font-size: 50%;background-color: rgba(0,0,0,.1);"}
|
32 |
| dtype | Model | Gradients | Backward pass | Optimizer step | Highest |
|
33 |
|---------|:-----|:------:|:------:|:------:|:------:|
|
34 |
| float32 | 413.18 MB | 413.18 MB | 826.36 MB | 1.61 GB | 1.61 GB |
|
@@ -45,7 +45,7 @@ But what happens as we scale?
|
|
45 |
|
46 |
Here's `llama-3-8B` (8.03B parameters)
|
47 |
|
48 |
-
::: {style="font-size: 50%;background-color: rgba(0,0,0,.1);"}
|
49 |
| dtype | Model | Gradients | Backward pass | Optimizer step | Highest |
|
50 |
|---------|:-----|:------:|:------:|:------:|:------:|
|
51 |
| float32 | 28.21 GB | 28.21 GB | 56.43 GB | 112.84 GB | 112.84 GB |
|
@@ -202,7 +202,7 @@ accelerate launch script.py
|
|
202 |
* Rely on `config.yaml` files
|
203 |
* Choose to either running `accelerate config` or write your own:
|
204 |
|
205 |
-
:::: {.columns style="font-size: 50%;padding-left:10%;background-color: rgba(0,0,0,.1);"}
|
206 |
::: {.column width="40%"}
|
207 |
```{.yaml filename=ddp_config.yaml}
|
208 |
compute_environment: LOCAL_MACHINE
|
@@ -302,7 +302,7 @@ for batch in dataloader:
|
|
302 |
|
303 |
* Let's tie that back up to the model estimator with neat tools like NVIDIA's TransformerEngine
|
304 |
|
305 |
-
::: {style="font-size: 60%;background-color: rgba(0,0,0,.1);"}
|
306 |
| Optimization Level | Computation (GEMM) | Comm | Weight | Master Weight | Weight Gradient | Optimizer States |
|
307 |
| -- | -- | -- | -- | -- | -- | -- |
|
308 |
| FP16 AMP | FP16 | FP32 | FP32 | N/A | FP32 | FP32+FP32 |
|
@@ -326,7 +326,7 @@ What is actually happening:
|
|
326 |
|
327 |
* Extremely similar, however mostly used different naming conventions for items and slight tweaks in the implementation
|
328 |
|
329 |
-
::: {style="font-size: 50%;background-color: rgba(0,0,0,.1);"}
|
330 |
Framework | Model Loading (`torch_dtype`) | Mixed Precision | Preparation (Local) | Training | Optimizer (Local)
|
331 |
--|--|--|--|--|--
|
332 |
FSDP | bf16 | default (none) | bf16 | bf16 | bf16
|
|
|
28 |
- Backward ~= 2x the model size
|
29 |
- The optimizer step ~= 4x the model size (1x model, 1x gradients, 2x optimizer):
|
30 |
|
31 |
+
::: {style="font-size: 50%;background-color: rgba(0,0,0,.1);color: #93a1a1;"}
|
32 |
| dtype | Model | Gradients | Backward pass | Optimizer step | Highest |
|
33 |
|---------|:-----|:------:|:------:|:------:|:------:|
|
34 |
| float32 | 413.18 MB | 413.18 MB | 826.36 MB | 1.61 GB | 1.61 GB |
|
|
|
45 |
|
46 |
Here's `llama-3-8B` (8.03B parameters)
|
47 |
|
48 |
+
::: {style="font-size: 50%;background-color: rgba(0,0,0,.1);color: #93a1a1;"}
|
49 |
| dtype | Model | Gradients | Backward pass | Optimizer step | Highest |
|
50 |
|---------|:-----|:------:|:------:|:------:|:------:|
|
51 |
| float32 | 28.21 GB | 28.21 GB | 56.43 GB | 112.84 GB | 112.84 GB |
|
|
|
202 |
* Rely on `config.yaml` files
|
203 |
* Choose to either running `accelerate config` or write your own:
|
204 |
|
205 |
+
:::: {.columns style="font-size: 50%;padding-left:10%;background-color: rgba(0,0,0,.1);color: #93a1a1;"}
|
206 |
::: {.column width="40%"}
|
207 |
```{.yaml filename=ddp_config.yaml}
|
208 |
compute_environment: LOCAL_MACHINE
|
|
|
302 |
|
303 |
* Let's tie that back up to the model estimator with neat tools like NVIDIA's TransformerEngine
|
304 |
|
305 |
+
::: {style="font-size: 60%;background-color: rgba(0,0,0,.1);color: #93a1a1;"}
|
306 |
| Optimization Level | Computation (GEMM) | Comm | Weight | Master Weight | Weight Gradient | Optimizer States |
|
307 |
| -- | -- | -- | -- | -- | -- | -- |
|
308 |
| FP16 AMP | FP16 | FP32 | FP32 | N/A | FP32 | FP32+FP32 |
|
|
|
326 |
|
327 |
* Extremely similar, however mostly used different naming conventions for items and slight tweaks in the implementation
|
328 |
|
329 |
+
::: {style="font-size: 50%;background-color: rgba(0,0,0,.1);color: #93a1a1;"}
|
330 |
Framework | Model Loading (`torch_dtype`) | Mixed Precision | Preparation (Local) | Training | Optimizer (Local)
|
331 |
--|--|--|--|--|--
|
332 |
FSDP | bf16 | default (none) | bf16 | bf16 | bf16
|