metadata
license: mit
tags:
- image-classification
- pytorch
- ViT
- transformers
- real-fake-detection
- deep-fake
- ai-detect
- ai-image-detection
metrics:
- accuracy
model-index:
- name: AI Image Detect Distilled
results:
- task:
type: image-classification
name: Image Classification
metrics:
- type: accuracy
value: 0.94
pipeline_tag: image-classification
library_name: transformers
AI Detection Model
Model Architecture and Training
Three separate models were initially trained:
- Midjourney vs. Real Images
- Stable Diffusion vs. Real Images
- Stable Diffusion Fine-tunings vs. Real Images
Data preparation process:
- Used Google's Open Image Dataset for real images
- Described real images using BLIP (Bootstrapping Language-Image Pre-training)
- Generated Stable Diffusion images using BLIP descriptions
- Found similar Midjourney images based on BLIP descriptions
This approach ensured real and AI-generated images were as similar as possible, differing only in their origin.
The three models were then distilled into a small ViT model with 11.8 Million Parameters, combining their learned features for more efficient detection.
Data Sources
- Google's Open Image Dataset: link
- Ivan Sivkov's Midjourney Dataset: link
- TANREI(NAMA)'s Stable Diffusion Prompts Dataset: link
Performance
Validation Set: 94% accuracy
- Held out from training data to assess generalization
Custom Real-World Set: 84% accuracy
- Composed of self-captured images and online-sourced images
- Designed to be more representative of internet-based images
Comparative Analysis:
- Outperformed other popular AI detection models by 5 percentage points on both sets
- Other models achieved 89% and 79% on validation and real-world sets respectively
Key Insights
- Strong generalization on validation data (94% accuracy)
- Good adaptability to diverse, real-world images (84% accuracy)
- Consistent outperformance of other popular models
- 10-point accuracy drop from validation to real-world set indicates room for improvement
- Comprehensive training on multiple AI generation techniques contributes to model versatility
- Focus on subtle differences in image generation rather than content disparities
Future Directions
- Expand dataset with more diverse, real-world examples to bridge the performance gap
- Improve generalization to internet-sourced images
- Conduct error analysis on misclassified samples to identify patterns
- Integrate new AI image generation techniques as they emerge
- Consider fine-tuning for specific domains where detection accuracy is critical