innovation64
's Collections
papaer selecting
updated
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
•
2402.14083
•
Published
•
47
Linear Transformers are Versatile In-Context Learners
Paper
•
2402.14180
•
Published
•
6
Training-Free Long-Context Scaling of Large Language Models
Paper
•
2402.17463
•
Published
•
19
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
605
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper
•
2402.17753
•
Published
•
18
Resonance RoPE: Improving Context Length Generalization of Large
Language Models
Paper
•
2403.00071
•
Published
•
22
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
•
2403.03853
•
Published
•
61
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
93
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
•
2309.08968
•
Published
•
22
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
125
Evaluating Frontier Models for Dangerous Capabilities
Paper
•
2403.13793
•
Published
•
7
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
•
2403.17887
•
Published
•
78
Clover: Regressive Lightweight Speculative Decoding with Sequential
Knowledge
Paper
•
2405.00263
•
Published
•
14
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model
Editing with Llama-3
Paper
•
2405.00664
•
Published
•
18
Prometheus 2: An Open Source Language Model Specialized in Evaluating
Other Language Models
Paper
•
2405.01535
•
Published
•
119
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper
•
2405.09215
•
Published
•
18
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in
Language Models
Paper
•
2405.09220
•
Published
•
24
LoRA Learns Less and Forgets Less
Paper
•
2405.09673
•
Published
•
87
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Paper
•
2405.10637
•
Published
•
19
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
46
2BP: 2-Stage Backpropagation
Paper
•
2405.18047
•
Published
•
23
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
•
2405.20541
•
Published
•
22
4-bit Shampoo for Memory-Efficient Network Training
Paper
•
2405.18144
•
Published
•
10
Transformers meet Neural Algorithmic Reasoners
Paper
•
2406.09308
•
Published
•
43
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
•
2406.09170
•
Published
•
24