--- datasets: - ILSVRC/imagenet-1k license: mit language: - en base_model: - xuantonglll/ELM --- This is the model release of the paper # [Elucidating the design space of language models for image generation](https://arxiv.org/abs/2410.16257) You may check the paper: [arXiv](https://arxiv.org/abs/2410.16257), code: [Github](https://github.com/Pepper-lll/LMforImageGeneration) We provide 4 Binary-Autoencoder (BAE) tokenizers, following [Binary Latent Diffusion](https://github.com/ZeWang95/BinaryLatentDiffusion), with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256. | Code Dim | Bernoulli Sampling | Link | Size | | ------------- | ------------- |-------------|-------------| | 16 | ✅ | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16/binaryae_ema.th?download=true) | 332MB | | 16 | ❌ | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16_deter/binaryae_ema.th?download=true) | 332MB| | 20 | ✅ | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_20/binaryae_ema.th?download=true) | 332MB | | 24 | ✅ | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_24/binaryae_ema.th?download=true)| 332MB | The generation model architecture is adapted from [Llama2](https://github.com/meta-llama/llama), following [LlameGen](https://github.com/FoundationVision/LlamaGen). | Model | Link | Size | | ------------- | -------------| -------------| |AR-L |[[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-12.pth?download=true)| 1.25GB~1.77GB| |AR-XL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-12.pth?download=true) | 2.95GB~3.6GB| |AR-XXL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-1-16.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-12.pth?download=true) | 5.49GB~6.25GB| |AR-2B | [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/2B-2-12.pth?download=true) | 7.64GB| |MLM-L | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmL-1-16.pth?download=true) | 1.51GB| |MLM-XL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXL-1-16.pth?download=true) | 3.27GB| |MLM-XXL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXXL-1-16.pth?download=true) | 5.86GB|