Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.07987

MoEs papers reading list

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Paper • 1701.06538 • Published Jan 23, 2017 • 5
Sparse Networks from Scratch: Faster Training without Losing Performance

Paper • 1907.04840 • Published Jul 10, 2019 • 3
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Paper • 1910.02054 • Published Oct 4, 2019 • 4
A Mixture of h-1 Heads is Better than h Heads

Paper • 2005.06537 • Published May 13, 2020 • 2

Mixture of Experts Papers

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 50
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 45
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 14

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 60
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 70
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 45

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 30
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 21
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 66

mixture-of-experts

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Paper • 1701.06538 • Published Jan 23, 2017 • 5
Sparse Networks from Scratch: Faster Training without Losing Performance

Paper • 1907.04840 • Published Jul 10, 2019 • 3
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Paper • 1910.02054 • Published Oct 4, 2019 • 4
A Mixture of h-1 Heads is Better than h Heads

Paper • 2005.06537 • Published May 13, 2020 • 2

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 28
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 27
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 5
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 31

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 56
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 40
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 12
Mini-GPTs: Efficient Large Language Models through Contextual Pruning

Paper • 2312.12682 • Published Dec 20, 2023 • 8

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
SubGen: Token Generation in Sublinear Time and Memory

Paper • 2402.06082 • Published Feb 8, 2024 • 11
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 48
SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6, 2024 • 78

transformer inference improvement

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41

Transformer-foundational

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 257

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs