--- license: mit tags: - image-classification - pytorch - ViT - transformers - real-fake-detection metrics: - accuracy model-index: - name: AI Image Detect Distilled results: - task: type: image-classification name: Image Classification metrics: - type: accuracy value: 0.94 pipeline_tag: image-classification library_name: transformers --- # AI Detection Model ## Model Architecture and Training Three separate models were initially trained: 1. Midjourney vs. Real Images 2. Stable Diffusion vs. Real Images 3. Stable Diffusion Fine-tunings vs. Real Images Data preparation process: - Used Google's Open Image Dataset for real images - Described real images using BLIP (Bootstrapping Language-Image Pre-training) - Generated Stable Diffusion images using BLIP descriptions - Found similar Midjourney images based on BLIP descriptions This approach ensured real and AI-generated images were as similar as possible, differing only in their origin. The three models were then distilled into a small ViT model with 11.8 Million Parameters, combining their learned features for more efficient detection. ## Data Sources - Google's Open Image Dataset: [link](https://storage.googleapis.com/openimages/web/index.html) - Ivan Sivkov's Midjourney Dataset: [link](https://www.kaggle.com/datasets/ivansivkovenin/midjourney-prompts-image-part8) - TANREI(NAMA)'s Stable Diffusion Prompts Dataset: [link](https://www.kaggle.com/datasets/tanreinama/900k-diffusion-prompts-dataset) ## Performance - Validation Set: 94% accuracy - Held out from training data to assess generalization - Custom Real-World Set: 84% accuracy - Composed of self-captured images and online-sourced images - Designed to be more representative of internet-based images - Comparative Analysis: - Outperformed other popular AI detection models by 5 percentage points on both sets - Other models achieved 89% and 79% on validation and real-world sets respectively ## Key Insights 1. Strong generalization on validation data (94% accuracy) 2. Good adaptability to diverse, real-world images (84% accuracy) 3. Consistent outperformance of other popular models 4. 10-point accuracy drop from validation to real-world set indicates room for improvement 5. Comprehensive training on multiple AI generation techniques contributes to model versatility 6. Focus on subtle differences in image generation rather than content disparities ## Future Directions - Expand dataset with more diverse, real-world examples to bridge the performance gap - Improve generalization to internet-sourced images - Conduct error analysis on misclassified samples to identify patterns - Integrate new AI image generation techniques as they emerge - Consider fine-tuning for specific domains where detection accuracy is critical