|
--- |
|
base_model: |
|
- kuotient/Meta-Llama-3-8B-Instruct |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
license: other |
|
license_name: llama3 |
|
--- |
|
# Llama-3-11.5B-Instruct-attenuated |
|
The core idea came from @jukofyork, see this [issue;](https://github.com/arcee-ai/mergekit/issues/198) |
|
|
|
As I understand, The concept of the idea is to make model think twice but leap same distances like original. but why 0.7071067812? |
|
> The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812 |
|
|
|
## Merge Details |
|
### Merge Method |
|
|
|
This model was merged using the passthrough merge method. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [kuotient/Meta-Llama-3-8B-Instruct](https://huggingface.co/kuotient/Meta-Llama-3-8B-Instruct) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
############################### |
|
# llama-3-attenuated.yaml # |
|
############################### |
|
|
|
# Use: mergekit-yaml --clone-tensors ./llama-3-attenuated.yaml ./llama-3-attenuated |
|
# See: https://github.com/arcee-ai/mergekit/issues/198 for discussion/reasoning behind this idea. |
|
|
|
# --- |
|
|
|
# The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812 |
|
const_tag: &scale_factor 0.7071067812 # 1/sqrt(2) |
|
|
|
# The filter parameters of a scaled block. |
|
attenuate-env: &attenuated_env |
|
parameters: |
|
scale: |
|
- filter: q_proj |
|
value: *scale_factor |
|
- filter: k_proj |
|
value: *scale_factor |
|
- value: 1.0 |
|
|
|
# --- |
|
|
|
slices: |
|
|
|
########################### |
|
# Block 1: miqu-1 [0, 16] # |
|
########################### |
|
- sources: |
|
- model: kuotient/Meta-Llama-3-8B-Instruct |
|
layer_range: [0, 8] # The first 8 layers of Block 1 are not duplicated |
|
- sources: |
|
- model: kuotient/Meta-Llama-3-8B-Instruct |
|
layer_range: [8, 16] # The last 8 layers of Block 1 are are duplicated twice |
|
<<: *attenuated_env |
|
|
|
########################### |
|
# Block 2: miqu-1 [8, 24] # |
|
########################### |
|
- sources: |
|
- model: kuotient/Meta-Llama-3-8B-Instruct |
|
layer_range: [8, 24] # All the layers of Block 2 are are duplicated twice |
|
<<: *attenuated_env |
|
|
|
########################## |
|
# Block 3: miqu-1 [16, 32] # |
|
########################## |
|
- sources: |
|
- model: kuotient/Meta-Llama-3-8B-Instruct |
|
layer_range: [16, 24] # The first 8 layers of Block 3 are are duplicated twice |
|
<<: *attenuated_env |
|
- sources: |
|
- model: kuotient/Meta-Llama-3-8B-Instruct |
|
layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated |
|
|
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
``` |
|
|