A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
This is the organization grouping all the models and datasets used in the TRL library.
Collections
2
models
81
trl-lib/Qwen2-0.5B-Reward-Math-Sheperd
Token Classification
•
Updated
•
35
trl-lib/Qwen2-0.5B-XPO
Text Generation
•
Updated
•
13
trl-lib/Qwen2-0.5B-OnlineDPO
Text Generation
•
Updated
•
35
trl-lib/Qwen2-0.5B-KTO
Text Generation
•
Updated
•
24
trl-lib/Qwen2-0.5B-ORPO
Text Generation
•
Updated
•
26
•
1
trl-lib/Qwen2-0.5B-DPO
Text Generation
•
Updated
•
62
•
4
trl-lib/Qwen2-0.5B-Reward
Text Classification
•
Updated
•
230
trl-lib/pythia-1b-deduped-tldr-rm
Updated
•
878
trl-lib/pythia-2.8b-deduped-tldr-online-dpo
Text Generation
•
Updated
•
14
trl-lib/pythia-6.9b-deduped-tldr-offline-dpo
Text Generation
•
Updated
•
14
datasets
19
trl-lib/documentation-images
Viewer
•
Updated
•
1
•
5.47k
trl-lib/ultrafeedback-prompt
Viewer
•
Updated
•
39.8k
•
871
•
3
trl-lib/math_shepherd
Viewer
•
Updated
•
445k
•
372
•
1
trl-lib/alpaca-cleaned
Viewer
•
Updated
•
51.8k
•
51
trl-lib/hh-rlhf-helpful-base
Viewer
•
Updated
•
46.2k
•
111
trl-lib/prm800k
Viewer
•
Updated
•
41.2k
•
64
•
1
trl-lib/rlaif-v
Viewer
•
Updated
•
83.1k
•
148
•
3
trl-lib/Capybara-Preferences
Viewer
•
Updated
•
15.4k
•
78
trl-lib/Capybara
Viewer
•
Updated
•
16k
•
1.08k
•
1
trl-lib/tldr
Viewer
•
Updated
•
130k
•
348