WD ViT Tagger v3

Supports ratings, characters and general tags.

Trained using https://github.com/SmilingWolf/JAX-CV.
TPUs used for training kindly provided by the TRC program.

Dataset

Last image id: 7220105
Trained on Danbooru images with IDs modulo 0000-0899.
Validated on images with IDs modulo 0950-0999.
Images with less than 10 general tags were filtered out.
Tags with less than 600 images were filtered out.

Validation results

v2.0: P=R: threshold = 0.2614, F1 = 0.4402
v1.0: P=R: threshold = 0.2547, F1 = 0.4278

What's new

Model v2.0/Dataset v3:
Trained for a few more epochs.
Used tag frequency-based loss scaling to combat class imbalance.

Model v1.1/Dataset v3:
Amended the JAX model config file: add image size.
No change to the trained weights.

Model v1.0/Dataset v3:
More training images, more and up-to-date tags (up to 2024-02-28).
Now timm compatible! Load it up and give it a spin using the canonical one-liner!
ONNX model is compatible with code developed for the v2 series of models.
The batch dimension of the ONNX model is not fixed to 1 anymore. Now you can go crazy with batch inference.
Switched to Macro-F1 to measure model performance since it gives me a better gauge of overall training progress.

Runtime deps

ONNX model requires onnxruntime >= 1.17.0

Inference code examples

For timm: https://github.com/neggles/wdv3-timm
For ONNX: https://huggingface.co/spaces/SmilingWolf/wd-tagger
For JAX: https://github.com/SmilingWolf/wdv3-jax

Final words

Subject to change and updates. Downstream users are encouraged to use tagged releases rather than relying on the head of the repo.

Downloads last month
9,928
Safetensors
Model size
94.6M params
Tensor type
F32
Β·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Spaces using SmilingWolf/wd-vit-tagger-v3 45