File size: 1,720 Bytes
5c81a32
 
 
 
 
 
 
 
 
 
 
5bb95f3
5c81a32
87e696f
b0965ee
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Abstract: "Language models can generate harmful and biased outputs and exhibit undesirable behavior according to a given cultural context. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset."
Applicable Models: .nan
Authors: Irene Solaiman, Christy Dennison
Considerations: Requires predefining what adherence to a culture means for human evals
Datasets: .nan
Group: CulturalEvals
Hashtags: .nan
Link: 'Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets'
Modality: Text
Screenshots: .nan
Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
Level: Output
URL: http://arxiv.org/abs/2106.10328
What it is evaluating: Adherence to defined norms for a set of cultural categories
Metrics: .nan
Affiliations: .nan
Methodology: .nan