-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 191 -
Iterative Reasoning Preference Optimization
Paper • 2404.19733 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2404.19733
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 58 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 49 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 49 -
On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Paper • 2312.01639 • Published • 1
-
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 71 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Paper • 2308.00436 • Published • 22
-
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Paper • 2311.13231 • Published • 26 -
Nash Learning from Human Feedback
Paper • 2312.00886 • Published • 14 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 26 -
MusicRL: Aligning Music Generation to Human Preferences
Paper • 2402.04229 • Published • 17
-
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 10 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 47 -
SuperHF: Supervised Iterative Learning from Human Feedback
Paper • 2310.16763 • Published • 1 -
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Paper • 2311.15657 • Published • 2