MegaWika: Millions of reports and their sources across 50 diverse languages Paper • 2307.07049 • Published Jul 13, 2023
Defending Against Poisoning Attacks in Open-Domain Question Answering Paper • 2212.10002 • Published Dec 20, 2022
Learning to Reason via Program Generation, Emulation, and Search Paper • 2405.16337 • Published May 25, 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation Paper • 2406.17186 • Published Jun 24, 2024 • 1
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published Sep 17, 2024 • 21
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 17 days ago • 116
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection Paper • 2004.07667 • Published Apr 16, 2020
Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation Paper • 2305.16938 • Published May 26, 2023
Lexical Generalization Improves with Larger Models and Longer Training Paper • 2210.12673 • Published Oct 23, 2022
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31, 2024 • 10
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Paper • 2403.15246 • Published Mar 22, 2024 • 9
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 61
Paloma: A Benchmark for Evaluating Language Model Fit Paper • 2312.10523 • Published Dec 16, 2023 • 12