File size: 2,991 Bytes
d5dff4b c0b4486 c13d4ec d5dff4b c0b4486 c0ce103 c0b4486 9adf6e1 c0ce103 c0b4486 402f230 c0b4486 9adf6e1 c0b4486 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: unknown
pipeline_tag: token-classification
tags:
- wine
- ner
widget:
- text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022'
example_title: 'California Cab'
---
# Wineberto ner model
Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine.
<b>The label discovery doesnt work as well as just using the panigrah/winberto-labels model. </b>
* Updated to remove bias on position of wine label in the training inputs.
* also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable.
## Model description
## How to use
You can use this model directly for named entity recognition like so
```python
>>> from transformers import pipeline
>>> ner = pipeline('ner', model='winberto-ner-uncased')
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022")
>>> for t in toks:
>>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")
heitz: producer: 0.99988
cab: wine: 0.9999
##ernet sauvignon: wine: 0.95893
california: province: 0.99992
napa valley: region: 0.99991
napa: subregion: 0.99987
us: country: 0.99996
oak: flavor: 0.99992
juicy: mouthfeel: 0.99992
cherry: flavor: 0.99994
fruit: flavor: 0.99994
cara: flavor: 0.99993
##mel: flavor: 0.99731
mint: flavor: 0.99994
balanced: mouthfeel: 0.99992
```
## Training data
The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens
```
adjective: nice, exciting, strong etc
country: countries specified in label or description
flavor: fruit, apple, toast, smoke etc
grape: Cab, Cabernet Sauvignon, etc
mouthfeel: lucious, smooth, textured, rough etc
producer: wine maker
province, region: province and region of wine - sometimes these get mixed up
```
## Training procedure
```
model_id = 'bert-base-uncased'
arguments = TrainingArguments(
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=5,
weight_decay=0.01,
)
...
trainer.train()
```
|