leonardlin
's Collections
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
88
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper
•
2310.05914
•
Published
•
14
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
•
2401.03462
•
Published
•
27
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
49
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
and Toxicity
Paper
•
2401.01967
•
Published
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
123
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
50
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
28
What Makes Good Data for Alignment? A Comprehensive Study of Automatic
Data Selection in Instruction Tuning
Paper
•
2312.15685
•
Published
•
16
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
145
TOFU: A Task of Fictitious Unlearning for LLMs
Paper
•
2401.06121
•
Published
•
15
Tuning LLMs with Contrastive Alignment Instructions for Machine
Translation in Unseen, Low-resource Languages
Paper
•
2401.05811
•
Published
•
6
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
64
WARM: On the Benefits of Weight Averaged Reward Models
Paper
•
2401.12187
•
Published
•
18
Learning Universal Predictors
Paper
•
2401.14953
•
Published
•
19
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
•
2401.16380
•
Published
•
48
Language Models can be Logical Solvers
Paper
•
2311.06158
•
Published
•
18
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
29
Continual Learning for Large Language Models: A Survey
Paper
•
2402.01364
•
Published
•
1
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
29
Vision Superalignment: Weak-to-Strong Generalization for Vision
Foundation Models
Paper
•
2402.03749
•
Published
•
12
Suppressing Pink Elephants with Direct Principle Feedback
Paper
•
2402.07896
•
Published
•
9
How to Train Data-Efficient LLMs
Paper
•
2402.09668
•
Published
•
40
QuRating: Selecting High-Quality Data for Training Language Models
Paper
•
2402.09739
•
Published
•
4
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
•
2402.09353
•
Published
•
26
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Paper
•
2402.13228
•
Published
•
3
FuseChat: Knowledge Fusion of Chat Models
Paper
•
2402.16107
•
Published
•
36
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
•
2403.13372
•
Published
•
62
Evolutionary Optimization of Model Merging Recipes
Paper
•
2403.13187
•
Published
•
50
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language
Model Fine-Tuning
Paper
•
2403.17919
•
Published
•
16
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
60
Insights into Alignment: Evaluating DPO and its Variants Across Multiple
Tasks
Paper
•
2404.14723
•
Published
•
10
Instruction Tuning with Human Curriculum
Paper
•
2310.09518
•
Published
•
3
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
46
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
34
Self-Play Preference Optimization for Language Model Alignment
Paper
•
2405.00675
•
Published
•
25
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
•
2405.01481
•
Published
•
25
FLAME: Factuality-Aware Alignment for Large Language Models
Paper
•
2405.01525
•
Published
•
24
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
•
2405.01470
•
Published
•
61
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
118
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
66
Understanding the performance gap between online and offline alignment
algorithms
Paper
•
2405.08448
•
Published
•
14
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
•
2405.14734
•
Published
•
11
Self-Improving Robust Preference Optimization
Paper
•
2406.01660
•
Published
•
18
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
Learning
Paper
•
2312.01552
•
Published
•
30
Creativity Has Left the Chat: The Price of Debiasing Language Models
Paper
•
2406.05587
•
Published
•
1
Sailor: Open Language Models for South-East Asia
Paper
•
2404.03608
•
Published
•
20
Continued Pretraining for Better Zero- and Few-Shot Promptability
Paper
•
2210.10258
•
Published
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference
Datasets
Paper
•
2405.18952
•
Published
•
10