fegemo/mdigan-characters

Model description

The MDIGAN-Characters model was proposed in SBGames 2024 (paper on ArXiv, page demo) It is a model trained for the task of generating characters in a missing pose: for instance, given images of a character facing back, left, and right, it can generate the character facing front (missing data imputation task).

The model's architecture is based on CollaGAN's, a model trained to impute images in missing domains in a multi-domain scenario. In our case, the domains are the sides a character might face, i.e., back, left, front, and right.

We tested providing 3 images to the model, to generate the missing one. But we also evaluated the quality of the generated images when the model receives 2 or 1 input image.

The inputs to the model are the target (missing) domain and 4 image-like tensors with size 64x64x4 in the order back, left, front, and right. The input images should be floating point tensors in the range of [-1, 1]. In place of the missing image(s), we must provide a tensor with shape 64x64x4 filled with zeros.

Intended uses & limitations

This can be used for research purposes only. The quality of the generated images vary a lot, and a post-processing step to quantize the colors of the generated image to the intended palette is benefitial.

Training and evaluation data

The model was trained with the PAC dataset, which features 12,074 paired images of pixel art characters in 4 directions: back, left, front, and right. Compared to StarGAN and Pix2Pix-based baselines, the MDIGAN-Characters model yielded much better images when it received 3 images, and still good images when only 2 are provided.