Spaces:
Sleeping
Sleeping
File size: 5,970 Bytes
50f7fee 8afb9b6 50f7fee 8afb9b6 50f7fee 8afb9b6 50f7fee 8afb9b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
title: Regard
emoji: 🤗
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- measurement
description: >-
Regard aims to measure language polarity towards and social perceptions of a demographic (e.g. gender, race, sexual orientation).
---
# Measurement Card for Regard
## Measurement Description
The `regard` measurement returns the estimated language polarity towards and social perceptions of a demographic (e.g. gender, race, sexual orientation).
It uses a model trained on labelled data from the paper ["The Woman Worked as a Babysitter: On Biases in Language Generation" (EMNLP 2019)](https://arxiv.org/abs/1909.01326)
## How to Use
This measurement requires two lists of strings as input, enabling comparing the estimated polarity between the groups.
```python
>>> regard = evaluate.load("regard", module_type="measurement")
>>> group1 = ['xyz are described as mean', 'xyz are thought of as being too ambitious']
>>> group2 = ['xyz are known for making too much noise', 'xyz are described as often violent']
>>> regard.compute(data = group1, references = group2)
```
### Inputs
- **data** (list of `str`): prediction/candidate sentences, e.g. sentences describing a given demographic group.
- **references** (list of `str`) (optional): reference/comparison sentences, e.g. sentences describing a different demographic group to compare against.
- **aggregation** (`str`) (optional): determines the type of aggregation performed.
If set to `None`, the difference between the regard scores for the two categories is returned.
Otherwise:
- `average` : returns the average regard for each category (negative, positive, neutral, other) for each group
- `maximum`: returns the maximum regard for each group
### Output Values
**With a single input**:
`regard` : the regard scores of each string in the input list (if no aggregation is specified)
```python
{'neutral': 0.95, 'positive': 0.02, 'negative': 0.02, 'other': 0.01}
{'negative': 0.97, 'other': 0.02, 'neutral': 0.01, 'positive': 0.0}
```
`average_regard`: the average regard for each category (negative, positive, neutral, other) (if `aggregation` = `average`)
```python
{'neutral': 0.48, 'positive': 0.01, 'negative': 0.5, 'other': 0.01}
```
`max_regard`: the maximum regard across all input strings (if `aggregation` = `maximum`)
```python
{'neutral': 0.95, 'positive': 0.024, 'negative': 0.972, 'other': 0.019}
```
**With two lists of inputs**:
By default, this measurement outputs a dictionary containing a list of regard scores, one for each category (negative, positive, neutral, other), representing the difference in regard between the two groups.
```python
{'neutral': 0.35, 'negative': -0.36, 'other': 0.01, 'positive': 0.01}
```
With the `aggregation='maximum'` option, this measurement will output the maximum regard for each group:
```python
{'negative': 0.98, 'other': 0.04, 'neutral': 0.03, 'positive': 0.0}
```
With the `aggregation='average'` option, this measurement will output the average regard for each category (negative, positive, neutral, other):
```python
{'neutral': 0.37, 'negative': 0.57, 'other': 0.05, 'positive': 0.01}
```
### Examples
Example 1 (single input):
```python
>>> regard = evaluate.load("regard")
>>> group1 = ['xyz are described as mean', 'xyz are thought of as being too ambitious']
>>> results = regard.compute(data = group1)
>>> for d in results['regard']:
... print({l['label']: round(l['score'],2) for l in d})
{'neutral': 0.95, 'positive': 0.02, 'negative': 0.02, 'other': 0.01}
{'negative': 0.97, 'other': 0.02, 'neutral': 0.01, 'positive': 0.0}
```
Example 2 (comparison mode):
```python
>>> regard = evaluate.load("regard", "compare")
>>> group1 = ['xyz are described as mean', 'xyz are thought of as being too ambitious']
>>> group2 = ['xyz are known for making too much noise', 'xyz are described as often violent']
>>> results = regard.compute(data = group1, references = group2)
>>> print({k: round(v, 2) for k, v in results['regard_difference'].items()})
{'neutral': 0.46, 'positive': 0.01, 'negative': -0.46, 'other': -0.01}
```
Example 3 (returns the maximum regard score):
```python
>>> regard = evaluate.load("regard", "compare")
>>> group1 = ['xyz are described as mean', 'xyz are thought of as being too ambitious']
>>> group2 = ['xyz are known for making too much noise', 'xyz are described as often violent']
>>> results = regard.compute(data = group1, references = group2, aggregation = "maximum")
>>> print({k: round(v, 2) for k, v in results['max_data_regard'].items()})
{'neutral': 0.95, 'positive': 0.02, 'negative': 0.97, 'other': 0.02}
>>> print({k: round(v, 2) for k, v in results['max_references_regard'].items()})
{'negative': 0.98, 'other': 0.04, 'neutral': 0.03, 'positive': 0.0}
```
Example 4 (returns the average regard score):
```python
>>> regard = evaluate.load("regard", "compare")
>>> group1 = ['xyz are described as mean', 'xyz are thought of as being too ambitious']
>>> group2 = ['xyz are known for making too much noise', 'xyz are described as often violent']
>>> results = regard.compute(data = group1, references = group2, aggregation = "average")
>>> print({k: round(v, 2) for k, v in results['average_data_regard'].items()})
{'neutral': 0.48, 'positive': 0.01, 'negative': 0.5, 'other': 0.01}
>>> print({k: round(v, 2) for k, v in results['average_references_regard'].items()})
{'negative': 0.96, 'other': 0.02, 'neutral': 0.02, 'positive': 0.0}
```
## Citation(s)
@article{https://doi.org/10.48550/arxiv.1909.01326,
doi = {10.48550/ARXIV.1909.01326},
url = {https://arxiv.org/abs/1909.01326},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
title = {The Woman Worked as a Babysitter: On Biases in Language Generation},
publisher = {arXiv},
year = {2019}
}
## Further References
- [`nlg-bias` library](https://github.com/ewsheng/nlg-bias/)
|