tuyenTS
's Collections
YAYI 2: Multilingual Open-Source Large Language Models
Paper
•
2312.14862
•
Published
•
13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
TrustLLM: Trustworthiness in Large Language Models
Paper
•
2401.05561
•
Published
•
66
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
44
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper
•
2401.02415
•
Published
•
53
Composable Function-preserving Expansions for Transformer Architectures
Paper
•
2308.06103
•
Published
•
19
Thinking Like Transformers
Paper
•
2106.06981
•
Published
Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment
Paper
•
2401.12474
•
Published
•
35
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
•
2310.11453
•
Published
•
96
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
•
2402.01093
•
Published
•
45
BlackMamba: Mixture of Experts for State-Space Models
Paper
•
2402.01771
•
Published
•
23
Code Representation Learning At Scale
Paper
•
2402.01935
•
Published
•
12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
76
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
Rethinking Optimization and Architecture for Tiny Language Models
Paper
•
2402.02791
•
Published
•
12
Scaling Laws for Fine-Grained Mixture of Experts
Paper
•
2402.07871
•
Published
•
11
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper
•
2402.07043
•
Published
•
13
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
•
2402.07827
•
Published
•
45
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language
Models
Paper
•
2402.10986
•
Published
•
77
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Paper
•
2310.10944
•
Published
•
9
DenseMamba: State Space Models with Dense Hidden Connection for
Efficient Large Language Models
Paper
•
2403.00818
•
Published
•
15
Rho-1: Not All Tokens Are What You Need
Paper
•
2404.07965
•
Published
•
88
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
34
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to
the Edge of Generalization
Paper
•
2405.15071
•
Published
•
37
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language
Models
Paper
•
2407.12327
•
Published
•
77
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper
•
2411.04965
•
Published
•
64