English
ELM / README.md
xuantonglll's picture
Update README.md
f652b9e verified
metadata
datasets:
  - ILSVRC/imagenet-1k
license: mit
language:
  - en
base_model:
  - xuantonglll/ELM

This is the model release of the paper

Elucidating the design space of language models for image generation

You may check the paper: arXiv, code: Github

We provide 4 Binary-Autoencoder (BAE) tokenizers, following Binary Latent Diffusion, with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.

Code Dim Bernoulli Sampling Link Size
16 link 332MB
16 link 332MB
20 link 332MB
24 link 332MB

The generation model architecture is adapted from Llama2, following LlameGen.

Model Link Size
AR-L [1-16] [2-8] [2-10] [2-12] 1.25GB~1.77GB
AR-XL [1-16] [2-8] [2-10] [2-12] 2.95GB~3.6GB
AR-XXL [1-16] [2-10] [2-12] 5.49GB~6.25GB
AR-2B [2-12] 7.64GB
MLM-L [1-16] 1.51GB
MLM-XL [1-16] 3.27GB
MLM-XXL [1-16] 5.86GB