AiresPucrs
/

toxicity-classifier

Inference Endpoints

Model card Files Files and versions Community

toxicity-classifier / README.md

nicholasKluge's picture

Update README.md

7713f55 verified 4 months ago

|

history blame contribute delete

1.92 kB

	---
	license: apache-2.0
	language:
	- en
	datasets:
	- AiresPucrs/toxic-comments
	library_name: transformers
	---
	# Toxicity Classifier (Teeny-Tiny Castle)

	This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research.

	## How to Use

	```python
	from huggingface_hub import hf_hub_download

	# Download the model (this will be the target of our attack)
	hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
	filename="toxicity-classifier/toxicity-model.keras",
	local_dir="./",
	repo_type="model"
	)

	# Download the tokenizer file
	hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
	filename="toxic-vocabulary.txt",
	local_dir="./",
	repo_type="model"
	)

	toxicity_model = tf.keras.models.load_model('./toxicity-classifier/toxicity-model.keras')

	# If you cloned the model repo, the path is toxicity_model/toxic_vocabulary.txt

	with open('toxic-vocabulary.txt', encoding='utf-8') as fp:
	vocabulary = [line.strip() for line in fp]
	fp.close()

	vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
	output_mode="int",
	output_sequence_length=100,
	vocabulary=vocabulary)

	strings = [
	'I think you should shut up your big mouth',
	'I do not agree with you'
	]

	preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)

	for i, string in enumerate(strings):
	print(f'{string}\n')
	print(f'Toxic 🤬 {(1 - preds[i][0]) * 100:.2f)}% \| Not toxic 😊 {preds[i][0] * 100:.2f}\n')
	print("_" * 50)

	```