--- license: apache-2.0 pipeline_tag: image-classification tags: - pytorch - vision library_name: transformers --- This model is the product of curiosity—imagine a choice that allows you to label anime images! **Disclaimer**: The model has been trained on an entirely new dataset. Predictions made by the model *prior to 2023 might be off*. It's advisable to fine-tune the model according to your specific use case. # Quick setup guide: ```python from transformers.modeling_outputs import ImageClassifierOutput from transformers import ViTImageProcessor, ViTForImageClassification import torch from PIL import Image model_name_or_path = "Ojimi/vit-anime-caption" processor = ViTImageProcessor.from_pretrained(model_name_or_path) model = ViTForImageClassification.from_pretrained(model_name_or_path) threshold = 0.3 device = torch.device('cuda') image = Image.open(YOUR_IMAGE_PATH) inputs = processor(image, return_tensors='pt') model.to(device=device) model.eval() with torch.no_grad(): pixel_values = inputs['pixel_values'].to(device=device) outputs : ImageClassifierOutput = model(pixel_values=pixel_values) logits = outputs.logits # The raw scores before applying any activation sigmoid = torch.nn.Sigmoid() # Sigmoid function to convert logits to probabilities logits : torch.FloatTensor = sigmoid(logits) # Applying sigmoid activation predictions = [] # List to store predictions for idx, p in enumerate(logits[0]): if p > threshold: # Applying a threshold of 0.3 to consider a class prediction predictions.append((model.config.id2label[idx], p.item())) # Storing class label and probability for tag in predictions: print(tag) ``` Why the `Sigmoid`? - Sigmoid turns boring scores into fun probabilities, so you can use thresholds and find more cool tags. - It's like a wizard turning regular stuff into magic potions! [Training guide](https://huggingface.co/Ojimi/vit-anime-caption/blob/main/training_guide.md)