LLM-Safety - a harik68 Collection

harik68 's Collections

LLM-Safety

updated Aug 28, 2024

Efficient Detection of Toxic Prompts in Large Language Models

Paper • 2408.11727 • Published Aug 21, 2024 • 12
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

Paper • 2408.10701 • Published Aug 20, 2024 • 11