edumunozsala
/

vit_base-224-in21k-ft-cifar100

+---
+language: es
+tags:
+- sagemaker
+- vit
+- ImageClassification
+- generated_from_trainer
+license: apache-2.0
+datasets:
+- cifar100
+metrics:
+- accuracy
+model-index:
+- name: vit_base-224-in21k-ft-cifar100
+  results:
+  - task:
+        name: Image Classification
+        type: image-classification
+    dataset:
+        name: "Cifar100"
+        type: cifar100
+    metrics:
+       - name: Accuracy,
+         type: accuracy,
+         value: 0.9148
+---
+# Model vit_base-224-in21k-ft-cifar100
+## **A finetuned model for Image classification in Spanish**
+This model was trained using Amazon SageMaker and the Hugging Face Deep Learning container,
+The base model is **Vision Transformer (base-sized model)** which  is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.[Link to base model](https://huggingface.co/google/vit-base-patch16-224-in21k)
+## Base model citation
+### BibTeX entry and citation info
+```bibtex
+@misc{wu2020visual,
+      title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
+      author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
+      year={2020},
+      eprint={2006.03677},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
+## Dataset
+[Link to dataset description](http://www.cs.toronto.edu/~kriz/cifar.html)
+The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton
+The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
+This dataset,CIFAR100, is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
+Sizes of datasets:
+- Train dataset: 50,000
+- Test dataset: 10,000
+## Intended uses & limitations
+This model is intented for Image Classification.
+## Hyperparameters
+    {
+    "epochs": "5",
+    "train_batch_size": "32",
+    "eval_batch_size": "8",
+    "fp16": "true",
+    "learning_rate": "1e-05",
+    }
+## Test results
+- Accuracy = 0.9148
+## Model in action
+### Usage for Image Classification
+```python
+from transformers import ViTFeatureExtractor, ViTModel
+from PIL import Image
+import requests
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
+model = ViTModel.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar100')
+inputs = feature_extractor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+last_hidden_states = outputs.last_hidden_state
+```
+Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)