---
datasets:
- ILSVRC/imagenet-1k
license: mit
language:
- en
base_model:
- xuantonglll/ELM
---

This is the model release of the paper
# [Elucidating the design space of language models for image generation](https://arxiv.org/abs/2410.16257)

You may check the paper: [arXiv](https://arxiv.org/abs/2410.16257), code: [Github](https://github.com/Pepper-lll/LMforImageGeneration)

We provide 4 Binary-Autoencoder (BAE) tokenizers, following [Binary Latent Diffusion](https://github.com/ZeWang95/BinaryLatentDiffusion), with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.

| Code Dim  | Bernoulli Sampling | Link | Size |
| ------------- | ------------- |-------------|-------------|
| 16  | ✅  | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16/binaryae_ema.th?download=true) | 332MB |
| 16  | ❌ | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16_deter/binaryae_ema.th?download=true) | 332MB|
| 20  | ✅  | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_20/binaryae_ema.th?download=true) | 332MB |
| 24  | ✅  | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_24/binaryae_ema.th?download=true)| 332MB |

The generation model architecture is adapted from [Llama2](https://github.com/meta-llama/llama), following [LlameGen](https://github.com/FoundationVision/LlamaGen).

| Model  | Link | Size |
| ------------- | -------------| -------------|
|AR-L |[[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-1-16.pth?download=true)  [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-12.pth?download=true)| 1.25GB~1.77GB|
|AR-XL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-10.pth?download=true)  [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-12.pth?download=true) | 2.95GB~3.6GB|
|AR-XXL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-1-16.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-10.pth?download=true)  [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-12.pth?download=true) | 5.49GB~6.25GB|
|AR-2B | [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/2B-2-12.pth?download=true) | 7.64GB|
|MLM-L | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmL-1-16.pth?download=true) | 1.51GB|
|MLM-XL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXL-1-16.pth?download=true) | 3.27GB|
|MLM-XXL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXXL-1-16.pth?download=true) | 5.86GB|