metadata
datasets:
- ILSVRC/imagenet-1k
license: mit
language:
- en
base_model:
- xuantonglll/ELM
This is the model release of the paper
Elucidating the design space of language models for image generation
You may check the paper: arXiv, code: Github
We provide 4 Binary-Autoencoder (BAE) tokenizers, following Binary Latent Diffusion, with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.
Code Dim | Bernoulli Sampling | Link | Size |
---|---|---|---|
16 | ✅ | link | 332MB |
16 | ❌ | link | 332MB |
20 | ✅ | link | 332MB |
24 | ✅ | link | 332MB |
The generation model architecture is adapted from Llama2, following LlameGen.