ELM / README.md

Update README.md

f652b9e verified 3 months ago

3.02 kB

	---
	datasets:
	- ILSVRC/imagenet-1k
	license: mit
	language:
	- en
	base_model:
	- xuantonglll/ELM
	---

	This is the model release of the paper
	# [Elucidating the design space of language models for image generation](https://arxiv.org/abs/2410.16257)

	You may check the paper: [arXiv](https://arxiv.org/abs/2410.16257), code: [Github](https://github.com/Pepper-lll/LMforImageGeneration)

	We provide 4 Binary-Autoencoder (BAE) tokenizers, following [Binary Latent Diffusion](https://github.com/ZeWang95/BinaryLatentDiffusion), with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.

	\| Code Dim \| Bernoulli Sampling \| Link \| Size \|
	\| ------------- \| ------------- \|-------------\|-------------\|
	\| 16 \| ✅ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16/binaryae_ema.th?download=true) \| 332MB \|
	\| 16 \| ❌ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16_deter/binaryae_ema.th?download=true) \| 332MB\|
	\| 20 \| ✅ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_20/binaryae_ema.th?download=true) \| 332MB \|
	\| 24 \| ✅ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_24/binaryae_ema.th?download=true)\| 332MB \|

	The generation model architecture is adapted from [Llama2](https://github.com/meta-llama/llama), following [LlameGen](https://github.com/FoundationVision/LlamaGen).

	\| Model \| Link \| Size \|
	\| ------------- \| -------------\| -------------\|
	\|AR-L \|[[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-12.pth?download=true)\| 1.25GB~1.77GB\|
	\|AR-XL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-12.pth?download=true) \| 2.95GB~3.6GB\|
	\|AR-XXL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-1-16.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-12.pth?download=true) \| 5.49GB~6.25GB\|
	\|AR-2B \| [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/2B-2-12.pth?download=true) \| 7.64GB\|
	\|MLM-L \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmL-1-16.pth?download=true) \| 1.51GB\|
	\|MLM-XL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXL-1-16.pth?download=true) \| 3.27GB\|
	\|MLM-XXL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXXL-1-16.pth?download=true) \| 5.86GB\|

	---
	datasets:
	- ILSVRC/imagenet-1k
	license: mit
	language:
	- en
	base_model:
	- xuantonglll/ELM
	---

	This is the model release of the paper
	# [Elucidating the design space of language models for image generation](https://arxiv.org/abs/2410.16257)

	You may check the paper: [arXiv](https://arxiv.org/abs/2410.16257), code: [Github](https://github.com/Pepper-lll/LMforImageGeneration)

	We provide 4 Binary-Autoencoder (BAE) tokenizers, following [Binary Latent Diffusion](https://github.com/ZeWang95/BinaryLatentDiffusion), with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.

	\| Code Dim \| Bernoulli Sampling \| Link \| Size \|
	\| ------------- \| ------------- \|-------------\|-------------\|
	\| 16 \| ✅ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16/binaryae_ema.th?download=true) \| 332MB \|
	\| 16 \| ❌ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_16_deter/binaryae_ema.th?download=true) \| 332MB\|
	\| 20 \| ✅ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_20/binaryae_ema.th?download=true) \| 332MB \|
	\| 24 \| ✅ \| [link](https://huggingface.co/xuantonglll/ELM/resolve/main/bae/bae_24/binaryae_ema.th?download=true)\| 332MB \|

	The generation model architecture is adapted from [Llama2](https://github.com/meta-llama/llama), following [LlameGen](https://github.com/FoundationVision/LlamaGen).

	\| Model \| Link \| Size \|
	\| ------------- \| -------------\| -------------\|
	\|AR-L \|[[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/L-2-12.pth?download=true)\| 1.25GB~1.77GB\|
	\|AR-XL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-1-16.pth?download=true) [[2-8]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-8.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XL-2-12.pth?download=true) \| 2.95GB~3.6GB\|
	\|AR-XXL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-1-16.pth?download=true) [[2-10]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-10.pth?download=true) [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/XXL-2-12.pth?download=true) \| 5.49GB~6.25GB\|
	\|AR-2B \| [[2-12]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/2B-2-12.pth?download=true) \| 7.64GB\|
	\|MLM-L \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmL-1-16.pth?download=true) \| 1.51GB\|
	\|MLM-XL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXL-1-16.pth?download=true) \| 3.27GB\|
	\|MLM-XXL \| [[1-16]](https://huggingface.co/xuantonglll/ELM/resolve/main/gpt/mlmXXL-1-16.pth?download=true) \| 5.86GB\|