Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.07522

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 60
VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18, 2024 • 38
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Paper • 2401.13560 • Published Jan 24, 2024 • 1
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Paper • 2402.00789 • Published Feb 1, 2024 • 2

Trellis Networks for Sequence Modeling

Paper • 1810.06682 • Published Oct 15, 2018 • 1
Pruning Very Deep Neural Network Channels for Efficient Inference

Paper • 2211.08339 • Published Nov 14, 2022 • 1
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Paper • 2309.14157 • Published Sep 25, 2023 • 1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 139

Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
Trap of Feature Diversity in the Learning of MLPs

Paper • 2112.00980 • Published Dec 2, 2021 • 1
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics

Paper • 2301.05816 • Published Jan 14, 2023 • 1
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Paper • 2108.04384 • Published Aug 9, 2021 • 1

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 39
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 22
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 16
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 10

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs