Aleksey Korshuk commited on
Commit
dc051b8
·
unverified ·
1 Parent(s): 59a31fe

Update rlhf.md (#1178) [skip ci]

Browse files
Files changed (1) hide show
  1. docs/rlhf.md +3 -3
docs/rlhf.md CHANGED
@@ -19,14 +19,14 @@ The various RL training methods are implemented in trl and wrapped via axolotl.
19
 
20
  #### DPO
21
  ```yaml
22
- rl: true
23
  datasets:
24
  - path: Intel/orca_dpo_pairs
25
  split: train
26
- type: intel_apply_chatml
27
  - path: argilla/ultrafeedback-binarized-preferences
28
  split: train
29
- type: argilla_apply_chatml
30
  ```
31
 
32
  #### IPO
 
19
 
20
  #### DPO
21
  ```yaml
22
+ rl: dpo
23
  datasets:
24
  - path: Intel/orca_dpo_pairs
25
  split: train
26
+ type: chatml.intel
27
  - path: argilla/ultrafeedback-binarized-preferences
28
  split: train
29
+ type: chatml.argilla
30
  ```
31
 
32
  #### IPO