File size: 2,693 Bytes
fb96c2c
ac71aac
fb96c2c
375b851
fb96c2c
 
ef359f0
 
375b851
 
 
 
fb96c2c
 
ac71aac
fb96c2c
ac71aac
fb96c2c
 
ac71aac
fb96c2c
 
 
 
 
ef359f0
fb96c2c
 
 
05094bd
b2b5660
fb96c2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac71aac
fb96c2c
05094bd
 
 
375b851
05094bd
 
 
 
 
 
 
 
 
 
 
 
ef359f0
 
 
 
 
 
 
05094bd
 
39adc1e
05094bd
fb96c2c
 
 
 
 
 
05094bd
 
 
6e0cc2a
ef359f0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language: es
tags:
- "spanish"
metrics:
- accuracy
widget:
 - text: "Eres mas pequeño que un pitufo!"
 - text: "Eres muy feo!"
 - text: "Odio tu forma de hablar!"
 - text: "Eres tan fea que cuando eras pequeña te echaban de comer por debajo de la puerta."

---

# roberta-base-bne-finetuned-ciberbullying-spanish

This model is a fine-tuned version of [BSC-TeMU/roberta-base-bne](https://huggingface.co/BSC-TeMU/roberta-base-bne) on the dataset generated scrapping all social networks (Twitter, Youtube ...) to detect ciberbullying on Spanish.

It achieves the following results on the evaluation set:

- Loss: 0.1657
- Accuracy: 0.9607

## Training and evaluation data

I use the concatenation from multiple datasets generated scrapping social networks (Twitter,Youtube,Discord...)  to fine-tune this model. The total number of sentence pairs is above 360k sentences.

## Training procedure

<details>

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 4

### Training results

| Training Loss | Epoch | Step  | Accuracy | Validation Loss |
|:-------------:|:-----:|:-----:|:--------:|:---------------:|
| 0.1512        | 1.0   | 22227 | 0.9501   | 0.1418          |
| 0.1253        | 2.0   | 44454 | 0.9567   | 0.1499          |
| 0.0973        | 3.0   | 66681 | 0.9594   | 0.1397          |
| 0.0658        | 4.0   | 88908 | 0.9607   | 0.1657          |

</details>

### Model in action 🚀

Fast usage with **pipelines**:

```python
from transformers import pipeline

model_path = "JonatanGk/roberta-base-bne-finetuned-ciberbullying-spanish"
bullying_analysis = pipeline("text-classification", model=model_path, tokenizer=model_path)

bullying_analysis(
    "Desde que te vi me enamoré de ti."
    )

# Output:
[{'label': 'Not_bullying', 'score': 0.9995710253715515}]

bullying_analysis(
    "Eres tan fea que cuando eras pequeña te echaban de comer por debajo de la puerta."
    )
# Output:
[{'label': 'Bullying', 'score': 0.9918262958526611}] 
    
```

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JonatanGk/Shared-Colab/blob/master/Cyberbullying_detection_(SPANISH).ipynb)

### Framework versions

- Transformers 4.10.3
- Pytorch 1.9.0+cu102
- Datasets 1.12.1
- Tokenizers 0.10.3


> Special thx to [Manuel Romero/@mrm8488](https://huggingface.co/mrm8488) as my mentor & R.C.

> Created by [Jonatan Luna](https://JonatanGk.github.io) | [LinkedIn](https://www.linkedin.com/in/JonatanGk/)