|
--- |
|
datasets: |
|
- ILSVRC/imagenet-1k |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- xuantonglll/ELM |
|
--- |
|
|
|
This is the model release of the paper |
|
# [Elucidating the design space of language models for image generation](https://arxiv.org/abs/2410.16257) |
|
|
|
You may check the paper: [arXiv](https://arxiv.org/abs/2410.16257), code: [Github](https://github.com/Pepper-lll/LMforImageGeneration) |
|
|
|
We provide 4 Binary-Autoencoder (BAE) tokenizers, following [Binary Latent Diffusion](https://github.com/ZeWang95/BinaryLatentDiffusion), with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256. |
|
|
|
| Code Dim | Bernoulli Sampling | Link | Size | |
|
| ------------- | ------------- |-------------|-------------| |
|
| 16 | β
| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16/binaryae_ema.th?download=true) | 332MB | |
|
| 16 | β | [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16_deter/binaryae_ema.th?download=true) | 332MB| |
|
| 20 | β
| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_20/binaryae_ema.th?download=true) | 332MB | |
|
| 24 | β
| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_24/binaryae_ema.th?download=true)| 332MB | |
|
|
|
The generation model architecture is adapted from [Llama2](https://github.com/meta-llama/llama), following [LlameGen](https://github.com/FoundationVision/LlamaGen). |
|
|
|
| Model | Link | Size | |
|
| ------------- | -------------| -------------| |
|
|AR-L |[[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-12.pth?download=true)| 1.25GB~1.77GB| |
|
|AR-XL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-12.pth?download=true) | 2.95GB~3.6GB| |
|
|AR-XXL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-1-16.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-12.pth?download=true) | 5.49GB~6.25GB| |
|
|AR-2B | [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/2B-2-12.pth?download=true) | 7.64GB| |
|
|MLM-L | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmL-1-16.pth?download=true) | 1.51GB| |
|
|MLM-XL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXL-1-16.pth?download=true) | 3.27GB| |
|
|MLM-XXL | [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXXL-1-16.pth?download=true) | 5.86GB| |
|
|