StabilityAI_HuggingFace

AI & ML interests

None defined yet.

Recent Activity

sai-hf's activity

multimodalartĀ 
posted an update 6 months ago
multimodalartĀ 
posted an update 8 months ago
view post
Post
26126
The first open Stable Diffusion 3-like architecture model is JUST out šŸ’£ - but it is not SD3! šŸ¤”

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model šŸ–¼ļøāœØ, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english šŸ¤ chinese understanding

Try it out by yourself here ā–¶ļø https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
multimodalartĀ 
posted an update 10 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! šŸ“

Model
šŸ“ 2 base model variants mentioned: 2B and 8B sizes

šŸ“ New architecture in all abstraction levels:
- šŸ”½ UNet; ā¬†ļø Multimodal Diffusion Transformer, bye cross attention šŸ‘‹
- šŸ†• Rectified flows for the diffusion process
- šŸ§© Still a Latent Diffusion Model

šŸ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

šŸ—ƒļø Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
šŸ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
āœļø An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
āœ… State of the art in automated evals for composition and prompt understanding
āœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Ā·
multimodalartĀ 
posted an update 11 months ago
multimodalartĀ 
posted an update 11 months ago
view post
Post
It seems February started with a fully open source AI renaissance šŸŒŸ

Models released with fully open dataset, training code, weights āœ…

LLM - allenai/olmo-suite-65aeaae8fe5b6b2122b46778 šŸ§ 
Embedding - nomic-ai/nomic-embed-text-v1 šŸ“š (sota!)

And it's literally February 1st - can't wait to see what else the community will bring šŸ‘€