|
--- |
|
license: cc-by-nc-nd-3.0 |
|
--- |
|
|
|
# VyLinh-Lite: Vietnamese 3B Reasoning Language Model |
|
|
|
## Model Details |
|
|
|
- **Language(s)**: Vietnamese |
|
- **Base Model**: Qwen2.5-3B |
|
- **Model Size**: 3 billion parameters |
|
|
|
## Intended Use |
|
|
|
- **Primary intended uses**: Vietnamese language understanding, reasoning, and generation |
|
- **Primary intended users**: Researchers, developers, and practitioners working with Vietnamese language AI |
|
- **Out-of-scope use cases**: Production deployments without additional safety measures |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model underwent a sophisticated training process involving multiple stages of distillation and adaptation: |
|
|
|
1. Initial knowledge distillation from Llama 3.1 405B |
|
2. Architecture adaptation using mergekit-tokensurgeon |
|
3. Secondary distillation to Qwen architecture |
|
4. Parallel distillation from Qwen2-72B |
|
5. Final fusion and fine-tuning using EvolKit dataset |
|
|
|
### Training Procedure |
|
|
|
#### Distillation Process |
|
1. **Logit Distillation** |
|
- Source: Llama 3.1 405B |
|
- Method: Offline distillation |
|
- Storage: Top-K logits preservation |
|
|
|
2. **Cross-Architecture Adaptation** |
|
- Tool: mergekit-tokensurgeon |
|
- Process: Vocabulary alignment with Llama 3.1 405B |
|
|
|
3. **Architecture Transformation** |
|
- Target: 3B parameter configuration |
|
- Method: Progressive knowledge transfer |
|
|
|
#### Fine-tuning |
|
- **Final Stage**: EvolKit dataset utilization |
|
- **Optimization**: Focus on coherence and reasoning capabilities |
|
- **Vocabulary**: Qwen-native vocabulary restoration |
|
|
|
## Performance and Limitations |
|
|
|
### Benchmarks |
|
Will be updated throughout the day |
|
|
|
### Limitations |
|
- Model size constraints may impact certain complex reasoning tasks |
|
- Performance may vary on domain-specific Vietnamese content |
|
- Limited context window compared to larger models |
|
|
|
## Ethical Considerations |
|
|
|
- **Data Bias**: May reflect biases present in training data |
|
- **Environmental Impact**: Reduced compared to larger models due to efficient distillation |
|
- **Societal Impact**: Potential influence on Vietnamese language technology landscape |
|
|
|
## Technical Specifications |
|
|
|
- **Parameter Count**: 3 billion |
|
- **Context Window**: 32K |