The Objective Dad
commited on
ergonomic update to optimizer config doc (#548)
Browse files
README.md
CHANGED
@@ -560,6 +560,30 @@ log_sweep_min_lr:
|
|
560 |
log_sweep_max_lr:
|
561 |
|
562 |
# specify optimizer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
563 |
optimizer:
|
564 |
# specify weight decay
|
565 |
weight_decay:
|
|
|
560 |
log_sweep_max_lr:
|
561 |
|
562 |
# specify optimizer
|
563 |
+
# Valid values are driven by the Transformers OptimizerNames class, see:
|
564 |
+
# https://github.com/huggingface/transformers/blob/95b374952dc27d8511541d6f5a4e22c9ec11fb24/src/transformers/training_args.py#L134
|
565 |
+
#
|
566 |
+
# Note that not all optimizers may be available in your environment, ex: 'adamw_anyprecision' is part of
|
567 |
+
# torchdistx, 'adamw_bnb_8bit' is part of bnb.optim.Adam8bit, etc. When in doubt, it is recommended to start with the optimizer used
|
568 |
+
# in the examples/ for your model and fine-tuning use case.
|
569 |
+
#
|
570 |
+
# Valid values for 'optimizer' include:
|
571 |
+
# - adamw_hf
|
572 |
+
# - adamw_torch
|
573 |
+
# - adamw_torch_fused
|
574 |
+
# - adamw_torch_xla
|
575 |
+
# - adamw_apex_fused
|
576 |
+
# - adafactor
|
577 |
+
# - adamw_anyprecision
|
578 |
+
# - sgd
|
579 |
+
# - adagrad
|
580 |
+
# - adamw_bnb_8bit
|
581 |
+
# - lion_8bit
|
582 |
+
# - lion_32bit
|
583 |
+
# - paged_adamw_32bit
|
584 |
+
# - paged_adamw_8bit
|
585 |
+
# - paged_lion_32bit
|
586 |
+
# - paged_lion_8bit
|
587 |
optimizer:
|
588 |
# specify weight decay
|
589 |
weight_decay:
|