SwiftFormer (swiftformer-l1)

Model description

The SwiftFormer model was proposed in SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan.

SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called 'SwiftFormer' is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.

Intended uses & limitations

How to use

import requests
from PIL import Image

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained('shehan97/swiftformer-l1')
inputs = processor(images=image, return_tensors="pt")


from transformers.models.swiftformer import SwiftFormerForImageClassification
new_model = SwiftFormerForImageClassification.from_pretrained('shehan97/swiftformer-l1')

output = new_model(inputs['pixel_values'], output_hidden_states=True)
logits = output.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", new_model.config.id2label[predicted_class_idx])

Limitations and bias

Training data

The classification model is trained on the ImageNet-1K dataset.

Training procedure

Evaluation results

Downloads last month
33
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train MBZUAI/swiftformer-l1