Training Task Experts through Retrieval Based Distillation Paper • 2407.05463 • Published Jul 7, 2024 • 8
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 86
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19, 2024 • 22
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models Paper • 2410.13841 • Published Oct 17, 2024 • 15