vilm
/

VyLinh-Lite-preview

Model card Files Files and versions Community

VyLinh-Lite-preview / README.md

qnguyen3's picture

Update README.md

f1aaf57 verified 3 months ago

|

history blame contribute delete

2.16 kB

	---
	license: cc-by-nc-nd-3.0
	---

	# VyLinh-Lite: Vietnamese 3B Reasoning Language Model

	## Model Details

	- Language(s): Vietnamese
	- Base Model: Qwen2.5-3B
	- Model Size: 3 billion parameters

	## Intended Use

	- Primary intended uses: Vietnamese language understanding, reasoning, and generation
	- Primary intended users: Researchers, developers, and practitioners working with Vietnamese language AI
	- Out-of-scope use cases: Production deployments without additional safety measures

	## Training Details

	### Training Data

	The model underwent a sophisticated training process involving multiple stages of distillation and adaptation:

	1. Initial knowledge distillation from Llama 3.1 405B
	2. Architecture adaptation using mergekit-tokensurgeon
	3. Secondary distillation to Qwen architecture
	4. Parallel distillation from Qwen2-72B
	5. Final fusion and fine-tuning using EvolKit dataset

	### Training Procedure

	#### Distillation Process
	1. Logit Distillation
	- Source: Llama 3.1 405B
	- Method: Offline distillation
	- Storage: Top-K logits preservation

	2. Cross-Architecture Adaptation
	- Tool: mergekit-tokensurgeon
	- Process: Vocabulary alignment with Llama 3.1 405B

	3. Architecture Transformation
	- Target: 3B parameter configuration
	- Method: Progressive knowledge transfer

	#### Fine-tuning
	- Final Stage: EvolKit dataset utilization
	- Optimization: Focus on coherence and reasoning capabilities
	- Vocabulary: Qwen-native vocabulary restoration

	## Performance and Limitations

	### Benchmarks
	Will be updated throughout the day

	### Limitations
	- Model size constraints may impact certain complex reasoning tasks
	- Performance may vary on domain-specific Vietnamese content
	- Limited context window compared to larger models

	## Ethical Considerations

	- Data Bias: May reflect biases present in training data
	- Environmental Impact: Reduced compared to larger models due to efficient distillation
	- Societal Impact: Potential influence on Vietnamese language technology landscape

	## Technical Specifications

	- Parameter Count: 3 billion
	- Context Window: 32K