zzfive
's Collections
Infrastructure
updated
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
Pre-training on Web-scale Image-Text Data
Paper
•
2404.15653
•
Published
•
26
MoDE: CLIP Data Experts via Clustering
Paper
•
2404.16030
•
Published
•
12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
46
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
•
2405.12981
•
Published
•
28
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent
Diffusion Models
Paper
•
2405.14477
•
Published
•
17
Thermodynamic Natural Gradient Descent
Paper
•
2405.13817
•
Published
•
13
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using
Sparse RGB Cameras
Paper
•
2405.14866
•
Published
•
6
Transformers Can Do Arithmetic with the Right Embeddings
Paper
•
2405.17399
•
Published
•
52
2BP: 2-Stage Backpropagation
Paper
•
2405.18047
•
Published
•
23
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Paper
•
2405.17991
•
Published
•
12
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
•
2405.20204
•
Published
•
34
Artificial Generational Intelligence: Cultural Accumulation in
Reinforcement Learning
Paper
•
2406.00392
•
Published
•
12
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
37
PLaD: Preference-based Large Language Model Distillation with
Pseudo-Preference Pairs
Paper
•
2406.02886
•
Published
•
8
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
Paper
•
2406.02900
•
Published
•
11
GenAI Arena: An Open Evaluation Platform for Generative Models
Paper
•
2406.04485
•
Published
•
20
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in
the Wild
Paper
•
2406.04770
•
Published
•
27
Why Has Predicting Downstream Capabilities of Frontier AI Models with
Scale Remained Elusive?
Paper
•
2406.04391
•
Published
•
7
Boosting Large-scale Parallel Training Efficiency with C4: A
Communication-Driven Approach
Paper
•
2406.04594
•
Published
•
5
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
•
2406.06608
•
Published
•
56
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Paper
•
2406.08552
•
Published
•
23
Interpreting the Weight Space of Customized Diffusion Models
Paper
•
2406.09413
•
Published
•
18
Cognitively Inspired Energy-Based World Models
Paper
•
2406.08862
•
Published
•
9
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer
Decoding
Paper
•
2406.09297
•
Published
•
4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache
Compression
Paper
•
2406.11430
•
Published
•
22
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video
Understanding
Paper
•
2406.14515
•
Published
•
32
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Paper
•
2406.14563
•
Published
•
29
Judging the Judges: Evaluating Alignment and Vulnerabilities in
LLMs-as-Judges
Paper
•
2406.12624
•
Published
•
36
Efficient World Models with Context-Aware Tokenization
Paper
•
2406.19320
•
Published
•
7
We-Math: Does Your Large Multimodal Model Achieve Human-like
Mathematical Reasoning?
Paper
•
2407.01284
•
Published
•
75
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and
Efficient Evaluation
Paper
•
2407.00468
•
Published
•
34
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
Paper
•
2407.01791
•
Published
•
5
No Training, No Problem: Rethinking Classifier-Free Guidance for
Diffusion Models
Paper
•
2407.02687
•
Published
•
22
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Paper
•
2407.04620
•
Published
•
27
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paper
•
2407.03418
•
Published
•
8
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for
Text-to-Image Generation?
Paper
•
2407.04842
•
Published
•
53
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for
Interleaved Image-Text Generation
Paper
•
2407.06135
•
Published
•
21
An accurate detection is not all you need to combat label noise in
web-noisy datasets
Paper
•
2407.05528
•
Published
•
3
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical
Imaging
Paper
•
2407.07315
•
Published
•
6
SHERL: Synthesizing High Accuracy and Efficient Memory for
Resource-Limited Transfer Learning
Paper
•
2407.07523
•
Published
•
4
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context
Window?
Paper
•
2407.11963
•
Published
•
43
Efficient Training with Denoised Neural Weights
Paper
•
2407.11966
•
Published
•
8
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from
Low-Rank Gradients
Paper
•
2407.11239
•
Published
•
7
EfficientQAT: Efficient Quantization-Aware Training for Large Language
Models
Paper
•
2407.11062
•
Published
•
8
NNsight and NDIF: Democratizing Access to Foundation Model Internals
Paper
•
2407.14561
•
Published
•
34
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language
Understanding
Paper
•
2407.15754
•
Published
•
20
KAN or MLP: A Fairer Comparison
Paper
•
2407.16674
•
Published
•
42
SIGMA: Sinkhorn-Guided Masked Video Modeling
Paper
•
2407.15447
•
Published
•
8
PERSONA: A Reproducible Testbed for Pluralistic Alignment
Paper
•
2407.17387
•
Published
•
18
Longhorn: State Space Models are Amortized Online Learners
Paper
•
2407.14207
•
Published
•
17
VSSD: Vision Mamba with Non-Casual State Space Duality
Paper
•
2407.18559
•
Published
•
19
Diffusion Feedback Helps CLIP See Better
Paper
•
2407.20171
•
Published
•
36
Finch: Prompt-guided Key-Value Cache Compression
Paper
•
2408.00167
•
Published
•
13
POA: Pre-training Once for Models of All Sizes
Paper
•
2408.01031
•
Published
•
26
The Impact of Hyperparameters on Large Language Model Inference
Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper
•
2408.01050
•
Published
•
8
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond
Scaling
Paper
•
2408.04810
•
Published
•
22
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation
Agents
Paper
•
2408.06327
•
Published
•
16
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Paper
•
2408.08201
•
Published
•
18
Towards flexible perception with visual memory
Paper
•
2408.08172
•
Published
•
20
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution
Real-World Scenarios that are Difficult for Humans?
Paper
•
2408.13257
•
Published
•
25
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models
via K-wise Human Preferences
Paper
•
2408.14468
•
Published
•
35
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device
Language Models
Paper
•
2408.15518
•
Published
•
42
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse
Low-Rank Adaptation
Paper
•
2409.06633
•
Published
•
14
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector
Retrieval
Paper
•
2409.10516
•
Published
•
40
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
48
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary
Resolution
Paper
•
2409.12961
•
Published
•
25
Prithvi WxC: Foundation Model for Weather and Climate
Paper
•
2409.13598
•
Published
•
40
Making Text Embedders Few-Shot Learners
Paper
•
2409.15700
•
Published
•
29
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
Paper
•
2409.16040
•
Published
•
12
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large
Language Models
Paper
•
2409.17066
•
Published
•
28
Paper
•
2410.05258
•
Published
•
168
Addition is All You Need for Energy-efficient Language Models
Paper
•
2410.00907
•
Published
•
144
Selective Attention Improves Transformer
Paper
•
2410.02703
•
Published
•
23
A Comprehensive Survey of Mamba Architectures for Medical Image
Analysis: Classification, Segmentation, Restoration and Beyond
Paper
•
2410.02362
•
Published
•
17
AuroraCap: Efficient, Performant Video Detailed Captioning and a New
Benchmark
Paper
•
2410.03051
•
Published
•
4
MLP-KAN: Unifying Deep Representation and Function Learning
Paper
•
2410.03027
•
Published
•
29
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark
for Video Generation
Paper
•
2410.05363
•
Published
•
44
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation
Learning
Paper
•
2410.06373
•
Published
•
35
One Initialization to Rule them All: Fine-tuning via Explained Variance
Adaptation
Paper
•
2410.07170
•
Published
•
15
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference
Acceleration
Paper
•
2410.02367
•
Published
•
47
Contrastive Localized Language-Image Pre-Training
Paper
•
2410.02746
•
Published
•
33
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified
Multiplet Upcycling
Paper
•
2409.19291
•
Published
•
19
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers
in LLMs
Paper
•
2410.05265
•
Published
•
29
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large
Multimodal Models
Paper
•
2410.09732
•
Published
•
54
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
•
2410.10814
•
Published
•
49
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Paper
•
2410.13754
•
Published
•
75
Harnessing Webpage UIs for Text-Rich Visual Understanding
Paper
•
2410.13824
•
Published
•
30
FlatQuant: Flatness Matters for LLM Quantization
Paper
•
2410.09426
•
Published
•
12
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for
Parameter-Efficient Fine-Tuning
Paper
•
2410.13618
•
Published
•
6
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper
•
2410.13276
•
Published
•
26
AutoTrain: No-code training for state-of-the-art models
Paper
•
2410.15735
•
Published
•
59
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid
Visual Redundancy Reduction
Paper
•
2410.17247
•
Published
•
45
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
Contrastive Loss
Paper
•
2410.17243
•
Published
•
89
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Paper
•
2410.19168
•
Published
•
19
COAT: Compressing Optimizer states and Activation for Memory-Efficient
FP8 Training
Paper
•
2410.19313
•
Published
•
19
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Paper
•
2410.21264
•
Published
•
9
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence
Embeddings for Automatic Dialog Flow Extraction
Paper
•
2410.18481
•
Published
•
5
Relaxed Recursive Transformers: Effective Parameter Sharing with
Layer-wise LoRA
Paper
•
2410.20672
•
Published
•
6
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper
•
2410.18057
•
Published
•
200
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM
Inference
Paper
•
2410.21465
•
Published
•
11
Accelerating Direct Preference Optimization with Prefix Sharing
Paper
•
2410.20305
•
Published
•
6
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Paper
•
2410.23287
•
Published
•
19
A Pointer Network-based Approach for Joint Extraction and Detection of
Multi-Label Multi-Class Intents
Paper
•
2410.22476
•
Published
•
25
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
of Neural Networks
Paper
•
2410.20650
•
Published
•
16
M2rc-Eval: Massively Multilingual Repository-level Code Completion
Evaluation
Paper
•
2410.21157
•
Published
•
6
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
Quantization
Paper
•
2411.02355
•
Published
•
46
How Far is Video Generation from World Model: A Physical Law Perspective
Paper
•
2411.02385
•
Published
•
33
Sparsing Law: Towards Large Language Models with Greater Activation
Sparsity
Paper
•
2411.02335
•
Published
•
11
Controlling Language and Diffusion Models by Transporting Activations
Paper
•
2410.23054
•
Published
•
16
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Paper
•
2411.04997
•
Published
•
37
Can sparse autoencoders be used to decompose and interpret steering
vectors?
Paper
•
2411.08790
•
Published
•
8
Cut Your Losses in Large-Vocabulary Language Models
Paper
•
2411.09009
•
Published
•
43
Search, Verify and Feedback: Towards Next Generation Post-training
Paradigm of Foundation Models via Verifier Engineering
Paper
•
2411.11504
•
Published
•
19
SmoothCache: A Universal Inference Acceleration Technique for Diffusion
Transformers
Paper
•
2411.10510
•
Published
•
8
SageAttention2 Technical Report: Accurate 4 Bit Attention for
Plug-and-play Inference Acceleration
Paper
•
2411.10958
•
Published
•
51
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context
Training
Paper
•
2411.13476
•
Published
•
15
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper
•
2411.14402
•
Published
•
43
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
58
Natural Language Reinforcement Learning
Paper
•
2411.14251
•
Published
•
27
Ultra-Sparse Memory Network
Paper
•
2411.12364
•
Published
•
19
Factorized Visual Tokenization and Generation
Paper
•
2411.16681
•
Published
•
17
Cautious Optimizers: Improving Training with One Line of Code
Paper
•
2411.16085
•
Published
•
15
Star Attention: Efficient LLM Inference over Long Sequences
Paper
•
2411.17116
•
Published
•
47
Training Noise Token Pruning
Paper
•
2411.18092
•
Published
•
1
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended
Interleaved Image-Text Generation
Paper
•
2411.18499
•
Published
•
18
The Well: a Large-Scale Collection of Diverse Physics Simulations for
Machine Learning
Paper
•
2412.00568
•
Published
•
14
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent
Video Diffusion Model
Paper
•
2411.17459
•
Published
•
10
KV Shifting Attention Enhances Language Modeling
Paper
•
2411.19574
•
Published
•
8
APOLLO: SGD-like Memory, AdamW-level Performance
Paper
•
2412.05270
•
Published
•
38
OmniDocBench: Benchmarking Diverse PDF Document Parsing with
Comprehensive Annotations
Paper
•
2412.07626
•
Published
•
21
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion
Transformer
Paper
•
2412.07720
•
Published
•
30
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Paper
•
2412.06071
•
Published
•
7
Efficient Generative Modeling with Residual Vector Quantization-Based
Tokens
Paper
•
2412.10208
•
Published
•
19
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
80
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
•
2412.11768
•
Published
•
41
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models
with Flow Matching
Paper
•
2412.17153
•
Published
•
33
Fourier Position Embedding: Enhancing Attention's Periodic Extension for
Length Generalization
Paper
•
2412.17739
•
Published
•
37