Classification of patent title - "green" or "no green"
This model classifies patents into "green patents" or "no green patents" by their titles.
Examples of "green patents" titles:
- "A method for recycling waste" - score: 0.714
- "A method of reducing pollution" - score: 0.786
- "An apparatus to improve environmental aspects" - score: 0.570
- "A method to improve waste management" - score: 0.813
- "A device to use renewable energy sources" - score: 0.98
- "A technology for efficient electrical power generation"- score: 0.975
- "A method for the production of fuel of non-fossil origin" - score: 0.975
- "Biofuels from waste" - score: 0.88
- "A combustion technology with mitigation potential" - score: 0.947
- "A device to capture greenhouse gases" - score: 0.871
- "A method to reduce the greenhouse effect" - score: 0.887
- "A device to improve the climate" - score: 0.650
- "A device to stop climate change" - score: 0.55
Examples of "no green patents" titles:
- "A device to destroy the nature" - score: 0.19
- "A method to produce smoke" - score: 0.386
Examples of the model's limitation
- "A method to avoid trash" - score: 0.165
- "A method to reduce trash" - score: 0.333
- "A method to burn the Amazonas" - score: 0.501
- "A method to burn wood" - score: 0.408
- "Green plastics" - score: 0.126
- "Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715
Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html
distilbert-base-uncased-finetuned-greenpatent
This model is a fine-tuned version of distilbert-base-uncased on the green patent dataset. The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set:
- Loss: 0.3148
- Accuracy: 0.8776
- F1: 0.8770
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
---|---|---|---|---|---|
0.4342 | 1.0 | 101 | 0.3256 | 0.8721 | 0.8712 |
0.3229 | 2.0 | 202 | 0.3148 | 0.8776 | 0.8770 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cpu
- Datasets 2.8.0
- Tokenizers 0.13.2
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.