Classification of patent title - "green" or "no green"

This model classifies patents into "green patents" or "no green patents" by their titles.

Examples of "green patents" titles:

  • "A method for recycling waste" - score: 0.714
  • "A method of reducing pollution" - score: 0.786
  • "An apparatus to improve environmental aspects" - score: 0.570
  • "A method to improve waste management" - score: 0.813
  • "A device to use renewable energy sources" - score: 0.98
  • "A technology for efficient electrical power generation"- score: 0.975
  • "A method for the production of fuel of non-fossil origin" - score: 0.975
  • "Biofuels from waste" - score: 0.88
  • "A combustion technology with mitigation potential" - score: 0.947
  • "A device to capture greenhouse gases" - score: 0.871
  • "A method to reduce the greenhouse effect" - score: 0.887
  • "A device to improve the climate" - score: 0.650
  • "A device to stop climate change" - score: 0.55

Examples of "no green patents" titles:

  • "A device to destroy the nature" - score: 0.19
  • "A method to produce smoke" - score: 0.386

Examples of the model's limitation

  • "A method to avoid trash" - score: 0.165
  • "A method to reduce trash" - score: 0.333
  • "A method to burn the Amazonas" - score: 0.501
  • "A method to burn wood" - score: 0.408
  • "Green plastics" - score: 0.126
  • "Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715

Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html

distilbert-base-uncased-finetuned-greenpatent

This model is a fine-tuned version of distilbert-base-uncased on the green patent dataset. The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set:

  • Loss: 0.3148
  • Accuracy: 0.8776
  • F1: 0.8770

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
0.4342 1.0 101 0.3256 0.8721 0.8712
0.3229 2.0 202 0.3148 0.8776 0.8770

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cpu
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cwinkler/distilbert-base-uncased-finetuned-greenpatent