wineberto-ner / README.md
panigrah's picture
Update README.md
c13d4ec
metadata
license: unknown
pipeline_tag: token-classification
tags:
  - wine
  - ner
widget:
  - text: >-
      Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous
      100% varietal wine hails from oakville and was aged over three years in
      oak. juicy red-cherry fruit and a compelling hint of caramel greet the
      palate, framed by elegant, fine tannins and a subtle minty tone in the
      background. balanced and rewarding from start to finish, it has years
      ahead of it to develop further nuance. enjoy 2022
    example_title: California Cab

Wineberto ner model

Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine. The label discovery doesnt work as well as just using the panigrah/winberto-labels model.

  • Updated to remove bias on position of wine label in the training inputs.
  • also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable.

Model description

How to use

You can use this model directly for named entity recognition like so

>>> from transformers import pipeline
>>> ner = pipeline('ner', model='winberto-ner-uncased')
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022")
>>> for t in toks:
>>>    print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")

heitz: producer: 0.99988
cab: wine: 0.9999
##ernet sauvignon: wine: 0.95893
california: province: 0.99992
napa valley: region: 0.99991
napa: subregion: 0.99987
us: country: 0.99996
oak: flavor: 0.99992
juicy: mouthfeel: 0.99992
cherry: flavor: 0.99994
fruit: flavor: 0.99994
cara: flavor: 0.99993
##mel: flavor: 0.99731
mint: flavor: 0.99994
balanced: mouthfeel: 0.99992

Training data

The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens

adjective: nice, exciting, strong etc
country: countries specified in label or description
flavor: fruit, apple, toast, smoke etc
grape: Cab, Cabernet Sauvignon, etc
mouthfeel: lucious, smooth, textured, rough etc
producer: wine maker
province, region: province and region of wine - sometimes these get mixed up

Training procedure

model_id = 'bert-base-uncased'
arguments = TrainingArguments(
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    weight_decay=0.01,
)
...
trainer.train()