Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 48
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Paper • 2402.10379 • Published Feb 16, 2024 • 31
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24, 2024 • 15
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets Paper • 2405.18952 • Published May 29, 2024 • 10
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 67
West-of-N: Synthetic Preference Generation for Improved Reward Modeling Paper • 2401.12086 • Published Jan 22, 2024 • 1