STEM-AI-mtl commited on
Commit
ca46bec
·
verified ·
1 Parent(s): 4a32bed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -15
README.md CHANGED
@@ -1,15 +1,17 @@
1
  ---
2
  license: other
 
 
3
  tags:
4
  - vision
5
  - image-classification
6
  - STEM-AI-mtl/City_map
7
  - Google
8
  - ViT
 
9
  datasets:
10
  - STEM-AI-mtl/City_map
11
- license_name: stem.ai.mtl
12
- license_link: LICENSE
13
  widget:
14
  - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
15
  example_title: Tiger
@@ -19,11 +21,9 @@ widget:
19
  example_title: Palace
20
  ---
21
 
22
- # Vision Transformer (base-sized model)
23
-
24
- Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
25
 
26
- Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team.
27
 
28
  ## Model description
29
 
@@ -33,10 +33,6 @@ Images are presented to the model as a sequence of fixed-size patches (resolutio
33
 
34
  By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image.
35
 
36
- ## Intended uses & limitations
37
-
38
- You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=google/vit) to look for
39
- fine-tuned versions on a task that interests you.
40
 
41
  ### How to use
42
 
@@ -47,16 +43,16 @@ from transformers import ViTImageProcessor, ViTForImageClassification
47
  from PIL import Image
48
  import requests
49
 
50
- url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
51
  image = Image.open(requests.get(url, stream=True).raw)
52
 
53
- processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
54
- model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
55
 
56
  inputs = processor(images=image, return_tensors="pt")
57
  outputs = model(**inputs)
58
  logits = outputs.logits
59
- # model predicts one of the 1000 ImageNet classes
60
  predicted_class_idx = logits.argmax(-1).item()
61
  print("Predicted class:", model.config.id2label[predicted_class_idx])
62
  ```
@@ -65,7 +61,7 @@ For more code examples, we refer to the [documentation](https://huggingface.co/t
65
 
66
  ## Training data
67
 
68
- This ViT model was fine-tuned on the [STEM-AI-mtl/City_map dataset](https://huggingface.co/datasets/STEM-AI-mtl/City_map), contaning offer 600 images of 45 maps of cities around the world.
69
 
70
  ## Training procedure
71
 
 
1
  ---
2
  license: other
3
+ license_name: stem.ai.mtl
4
+ license_link: LICENSE
5
  tags:
6
  - vision
7
  - image-classification
8
  - STEM-AI-mtl/City_map
9
  - Google
10
  - ViT
11
+ - STEM-AI-mtl
12
  datasets:
13
  - STEM-AI-mtl/City_map
14
+
 
15
  widget:
16
  - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
17
  example_title: Tiger
 
21
  example_title: Palace
22
  ---
23
 
24
+ # The fine-tuned ViT model that beats [Google's base model](https://huggingface.co/google/vit-base-patch16-224)
 
 
25
 
26
+ Image-classification model that identifies which city map is illustrated from an image input.
27
 
28
  ## Model description
29
 
 
33
 
34
  By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image.
35
 
 
 
 
 
36
 
37
  ### How to use
38
 
 
43
  from PIL import Image
44
  import requests
45
 
46
+ url = 'https://assets.wfcdn.com/im/16661612/compr-r85/4172/41722749/new-york-city-map-on-paper-print.jpg'
47
  image = Image.open(requests.get(url, stream=True).raw)
48
 
49
+ processor = ViTImageProcessor.from_pretrained('STEM-AI-mtl/City_map-vit-base-patch16-224')
50
+ model = ViTForImageClassification.from_pretrained('STEM-AI-mtl/City_map-vit-base-patch16-224')
51
 
52
  inputs = processor(images=image, return_tensors="pt")
53
  outputs = model(**inputs)
54
  logits = outputs.logits
55
+
56
  predicted_class_idx = logits.argmax(-1).item()
57
  print("Predicted class:", model.config.id2label[predicted_class_idx])
58
  ```
 
61
 
62
  ## Training data
63
 
64
+ This [Google's ViT-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) model was fine-tuned on the [STEM-AI-mtl/City_map dataset](https://huggingface.co/datasets/STEM-AI-mtl/City_map), contaning overer 600 images of 45 different maps of cities around the world.
65
 
66
  ## Training procedure
67