fal/AuraFlow-v0.3 is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.
And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).
The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!
And i've created some other spaces for using FLUX models with captioners and enhancers.
- New tiling strategy - Now it's closer to Clarity Upscaler - It has more parameters to play and it has more room to fail because of that - You should try different resolutions, strength and controlnet strength
I've fine-tuned three types of PaliGemma image captioner models for generating prompts for Text2Image models. They generate captions similar to prompts we give to the image generation models. I used google/docci and google/imageinwords datasets for fine-tuning.
One shot evaluations is hard. That is honestly what I learnt throughout the last couple of weeks trying to make imgsys.org data more and more relevant. There is just so much diversity in these models that saying one is better than other one even at a particular domain is impossible.
If you have any suggestions on how we can make the testing easier for one shot, single question image model testing; please give your suggestions under this thread so we can provide a more meaningful data point to the community!
What is the current SOTA in terms of fast personalized image generation? Most of the techniques that produce great results (which is hard to objectively measure, but subject similarity index being close to 80-90%) take either too much time (full on DreamBooth fine-tuning the base model) or or loose on the auxilary properties (high rank LoRAs).
We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?