metadata
license: unknown
pipeline_tag: token-classification
tags:
- wine
- ner
widget:
- text: >-
Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous
100% varietal wine hails from oakville and was aged over three years in
oak. juicy red-cherry fruit and a compelling hint of caramel greet the
palate, framed by elegant, fine tannins and a subtle minty tone in the
background. balanced and rewarding from start to finish, it has years
ahead of it to develop further nuance. enjoy 2022
example_title: California Cab
Wineberto ner model
Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine. The label discovery doesnt work as well as just using the panigrah/winberto-labels model.
- Updated to remove bias on position of wine label in the training inputs.
- also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable.
Model description
How to use
You can use this model directly for named entity recognition like so
>>> from transformers import pipeline
>>> ner = pipeline('ner', model='winberto-ner-uncased')
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022")
>>> for t in toks:
>>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")
heitz: producer: 0.99988
cab: wine: 0.9999
##ernet sauvignon: wine: 0.95893
california: province: 0.99992
napa valley: region: 0.99991
napa: subregion: 0.99987
us: country: 0.99996
oak: flavor: 0.99992
juicy: mouthfeel: 0.99992
cherry: flavor: 0.99994
fruit: flavor: 0.99994
cara: flavor: 0.99993
##mel: flavor: 0.99731
mint: flavor: 0.99994
balanced: mouthfeel: 0.99992
Training data
The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens
adjective: nice, exciting, strong etc
country: countries specified in label or description
flavor: fruit, apple, toast, smoke etc
grape: Cab, Cabernet Sauvignon, etc
mouthfeel: lucious, smooth, textured, rough etc
producer: wine maker
province, region: province and region of wine - sometimes these get mixed up
Training procedure
model_id = 'bert-base-uncased'
arguments = TrainingArguments(
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=5,
weight_decay=0.01,
)
...
trainer.train()