Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 34
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 36
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates Paper • 2406.12935 • Published Jun 17, 2024 • 1
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding Paper • 2402.08983 • Published Feb 14, 2024 • 2
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs Paper • 2402.11753 • Published Feb 19, 2024 • 5
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 65
SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning Paper • 2403.09110 • Published Mar 14, 2024
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 10
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20, 2024 • 25
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 61
Paloma: A Benchmark for Evaluating Language Model Fit Paper • 2312.10523 • Published Dec 16, 2023 • 12
Detecting Pretraining Data from Large Language Models Paper • 2310.16789 • Published Oct 25, 2023 • 10
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 29
Lemur: Harmonizing Natural Language and Code for Language Agents Paper • 2310.06830 • Published Oct 10, 2023 • 31
Bridging the Domain Gap for Stance Detection for the Zulu language Paper • 2205.03153 • Published May 6, 2022