laion
/

CLIP-convnext_large_d.laion2B-s26B-b102K-augreg

Zero-Shot Image Classification

Model card Files Files and versions Metrics Training metrics Community

rwightman HF staff commited on Jan 29, 2023

Commit

ecfbd22

·

1 Parent(s): 387c31c

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -34,7 +34,6 @@ The models are trained at 256x256 (working on 384 variants) image resolution.
 At 256x256, the ConvNext-Large-D used roughly 1/2 the training FLOPs to achieve accuracy greater than previous L/14 model trained on LAION-2B. L/14 model is ~1.65x more GMAC, 1.45x more activations, and 1.22x more parameters. The ConvNeXt was trained with 26B samples-seen and L/14 with 34B.
-All models in this series were trained for 13B samples and have ImageNet Zero-Shot top-1 of >= 70.8%. Comparing to ViT-B/16 at 34B SS with zero-shot of 70.2% (68.1% for 13B SS) this suggests the ConvNeXt architecture may be more sample efficient in this range of model scale. More experiments needed to confirm.
 | Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) |
 | ----- | ------- | ---------- | ------------ | --------- |

 At 256x256, the ConvNext-Large-D used roughly 1/2 the training FLOPs to achieve accuracy greater than previous L/14 model trained on LAION-2B. L/14 model is ~1.65x more GMAC, 1.45x more activations, and 1.22x more parameters. The ConvNeXt was trained with 26B samples-seen and L/14 with 34B.
 | Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) |
 | ----- | ------- | ---------- | ------------ | --------- |