matlok
's Collections
Models - MoE
updated
Paper
•
2401.04088
•
Published
•
158
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
•
2401.15947
•
Published
•
49
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
70
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper
•
2308.14352
•
Published
HyperFormer: Enhancing Entity and Relation Interaction for
Hyper-Relational Knowledge Graph Completion
Paper
•
2308.06512
•
Published
•
2
Experts Weights Averaging: A New General Training Scheme for Vision
Transformers
Paper
•
2308.06093
•
Published
•
2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts
Paper
•
2403.04894
•
Published
•
2
Video Relationship Detection Using Mixture of Experts
Paper
•
2403.03994
•
Published
•
2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition
Paper
•
2402.02526
•
Published
•
3
GShard: Scaling Giant Models with Conditional Computation and Automatic
Sharding
Paper
•
2006.16668
•
Published
•
3
Scaling Vision with Sparse Mixture of Experts
Paper
•
2106.05974
•
Published
•
3
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Paper
•
2202.08906
•
Published
•
2
LocMoE: A Low-overhead MoE for Large Language Model Training
Paper
•
2401.13920
•
Published
•
2
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
Power Next-Generation AI Scale
Paper
•
2201.05596
•
Published
•
2
Scaling Laws for Fine-Grained Mixture of Experts
Paper
•
2402.07871
•
Published
•
11
xai-org/grok-1
Text Generation
•
Updated
•
409
•
2.21k
AetherResearch/Cerebrum-1.0-8x7b
Text Generation
•
Updated
•
23
•
79
databricks/dbrx-base
Text Generation
•
Updated
•
26
•
556
ai21labs/Jamba-v0.1
Text Generation
•
Updated
•
7.09k
•
1.17k
TechxGenus/Mini-Jamba
Text Generation
•
Updated
•
34
•
29
LoneStriker/Mixtral_7Bx5_MoE_30B-8.0bpw-h8-exl2
Text Generation
•
Updated
•
13
•
1
LoneStriker/laser-dolphin-mixtral-2x7b-dpo-8.0bpw-h8-exl2
Text Generation
•
Updated
•
18
•
4
LoneStriker/Mixtral_7Bx5_MoE_30B-6.0bpw-h6-exl2
Text Generation
•
Updated
•
15
•
1
jetmoe/jetmoe-8b
Text Generation
•
Updated
•
6.55k
•
245
mistralai/Mixtral-8x22B-Instruct-v0.1
Text Generation
•
Updated
•
3.67M
•
698