Improving Alignment and Robustness with Short Circuiting Paper • 2406.04313 • Published Jun 6, 2024 • 1
Efficient Detection of Toxic Prompts in Large Language Models Paper • 2408.11727 • Published Aug 21, 2024 • 12