|
--- |
|
datasets: |
|
- sade-adrien/redpajama_v2_sample_10M |
|
--- |
|
MappingAdapter exact structure available in representation_mapping.py |
|
|
|
Mapping "sentence-transformers/stsb-roberta-large"'s hidden representation to "mistralai/Mistral-7B-Instruct-v0.1"'s. |
|
|
|
Training: |
|
* Steps: 114k |
|
* Gradient accumulation: 2 |
|
* Batch size: 64 |
|
* Warm-up steps: 100 |
|
* Learning Rate: 3e-5 with linear scheduling |
|
* Eval steps: %8000 |
|
* Training hours: ~98h |
|
* Eval hours: ~10h |
|
|
|
* Gradient updates: 57k |
|
* Train examples: 7.3M |
|
* Eval examples: 106k |
|
* Adapter: Decoder_dim (4096) β 4096 β LeakyRelu(.1) β Encoder_dim (1024) |