|
Defines the learning rate to apply to certain layers of a model. H2O LLM Studio applies the regular learning rate to layers without a specified learning rate. |
|
|
|
- **Backbone** |
|
- H2O LLM Studio applies a different learning rate to a body of the neural network architecture. |
|
- **Value Head** |
|
- H2O LLM Studio applies a different learning rate to a value head of the neural network architecture. |
|
|
|
A common strategy is to apply a lower learning rate to the backbone of a model for better convergence and training stability. |
|
|