Collections
Discover the best community collections!
Collections including paper arxiv:2404.12195
-
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 11 -
SurgeGlobal/OpenBezoar-SFT
Text Generation • Updated • 33 • 3 -
SurgeGlobal/OpenBezoar-HH-RLHF-SFT
Text Generation • Updated • 33 -
SurgeGlobal/OpenBezoar-HH-RLHF-DPO
Text Generation • Updated • 39
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 30 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 82 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 11
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 50 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 84 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 20 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 29 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 11
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16 -
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 28 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 10
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 46 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44