Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.04248

State Space Models

Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 23
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 31
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 80
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16, 2024 • 42

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 17
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

Paper • 2401.15688 • Published Jan 28, 2024 • 11
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 70
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Paper • 2401.15071 • Published Jan 26, 2024 • 35

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Paper • 2311.14495 • Published Nov 24, 2023 • 1
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 60
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Paper • 2401.13560 • Published Jan 24, 2024 • 1
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Paper • 2402.00789 • Published Feb 1, 2024 • 2

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 17
Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1, 2024 • 12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 78
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2, 2024 • 46

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30, 2024 • 32
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 31
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 78
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 53
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 19
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 19
Linear Transformers are Versatile In-Context Learners

Paper • 2402.14180 • Published Feb 21, 2024 • 6

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 30
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 21
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 66

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 60
VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18, 2024 • 38
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Paper • 2401.13560 • Published Jan 24, 2024 • 1
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Paper • 2402.00789 • Published Feb 1, 2024 • 2

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 49
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 53
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 23
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 24

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 40
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 19
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 115
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 31

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs