Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23, 2024 • 24
iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat Text Generation • Updated Sep 16, 2024 • 1
iqbalamo93/Meta-Llama-3.1-8B-Instruct-GPTQ-Q_8 Text Generation • Updated Sep 14, 2024 • 23.6k • 3
iqbalamo93/Meta-Llama-3.1-8B-Instruct-GPTQ-Q_8 Text Generation • Updated Sep 14, 2024 • 23.6k • 3
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 87 • 25
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 87