cleanrl

non-profit

https://github.com/vwxyzjn/cleanrl

vwxyzjn

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ArashAhmadian authored a paper 17 days ago

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

ArashAhmadian authored a paper 6 months ago

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

ArashAhmadian authored a paper 6 months ago

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

View all activity

cleanrl's activity

ArashAhmadian

authored a paper 17 days ago

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Paper • 2412.04144 • Published 30 days ago • 4

ArashAhmadian

authored 2 papers 6 months ago

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Paper • 2407.02552 • Published Jul 2, 2024 • 4

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Paper • 2309.05444 • Published Sep 11, 2023 • 1

ArashAhmadian

authored a paper 7 months ago

Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3, 2024 • 18

vwxyzjn

updated 3 models 7 months ago

ArashAhmadian

authored a paper 7 months ago

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22, 2024 • 12

vwxyzjn

updated 7 models 8 months ago

cleanrl/EleutherAI_pythia-2.8b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 15

cleanrl/EleutherAI_pythia-1b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 1.75k

cleanrl/EleutherAI_pythia-1b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 2.48k

cleanrl/EleutherAI_pythia-2.8b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 281

cleanrl/EleutherAI_pythia-6.9b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 273

cleanrl/EleutherAI_pythia-6.9b-dedupedrewardtldr

Text Classification • Updated May 7, 2024 • 16

cleanrl/ppo_zephyr310

Text Generation • Updated May 1, 2024 • 16

qgallouedec

updated 5 models 9 months ago

cleanrl/BeamRiderNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/PongNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/BreakoutNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/QbertNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/SpaceInvadersNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

AI & ML interests

Recent Activity

Team members 6

cleanrl's activity