่ฐข้›†

sanaka87

AI & ML interests

Image Generation

Recent Activity

reacted to m-ric's post with ๐Ÿ‘€ about 18 hours ago
๐— ๐—ถ๐—ป๐—ถ๐— ๐—ฎ๐˜…'๐˜€ ๐—ป๐—ฒ๐˜„ ๐— ๐—ผ๐—˜ ๐—Ÿ๐—Ÿ๐—  ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐Ÿฐ๐—  ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต ๐Ÿ’ฅ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. ๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€: ๐Ÿ—๏ธ MoE with novel hybrid attention: โ€ฃ Mixture of Experts with 456B total parameters (45.9B activated per token) โ€ฃ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers ๐Ÿ† Outperforms leading models across benchmarks while offering vastly longer context: โ€ฃ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks โ€ฃ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) ๐Ÿ”ฌ Technical innovations enable efficient scaling: โ€ฃ Novel expert parallel and tensor parallel strategies cut communication overhead in half โ€ฃ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) ๐ŸŽฏ Thorough training strategy: โ€ฃ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! ๐Ÿ“ It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here ๐Ÿ‘‰ https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users ๐Ÿ‘‰ https://huggingface.co/MiniMaxAI/MiniMax-Text-01
updated a model about 18 hours ago
sanaka87/3DIS
View all activity

Organizations

None yet

Posts 1

view post
Post
1630
๐Ÿš€ Excited to Share Our Latest Work: 3DIS & 3DIS-FLUX for Multi-Instance Layout-to-Image Generation! โค๏ธโค๏ธโค๏ธ

๐ŸŽจ Daily Paper: 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering (2501.05131)
๐Ÿ”“ Code is now open source!
๐ŸŒ Project Website: https://limuloo.github.io/3DIS/
๐Ÿ  GitHub Repository: https://github.com/limuloo/3DIS
๐Ÿ“„ 3DIS Paper: https://arxiv.org/abs/2410.12669
๐Ÿ“„ 3DIS-FLUX Tech Report: https://arxiv.org/abs/2501.05131

๐Ÿ”ฅ Why 3DIS & 3DIS-FLUX?
Current SOTA multi-instance generation methods are typically adapter-based, requiring additional control modules trained on pre-trained models for layout and instance attribute control. However, with the emergence of more powerful models like FLUX and SD3.5, these methods demand constant retraining and extensive resources.

โœจ Our Solution: 3DIS
We introduce a decoupled approach that only requires training a low-resolution Layout-to-Depth model to convert layouts into coarse-grained scene depth maps. Leveraging community and company pre-trained models like ControlNet + SAM2, we enable training-free controllable image generation on high-resolution models such as SDXL and FLUX.

๐ŸŒŸ Benefits of Our Decoupled Multi-Instance Generation:
1. Enhanced Control: By constructing scenes using depth maps in the first stage, the model focuses on coarse-grained scene layout, improving control over instance placement.
2. Flexibility & Preservation: The second stage employs training-free rendering methods, allowing seamless integration with various models (e.g., fine-tuned weights, LoRA) while maintaining the generative capabilities of pre-trained models.

Join us in advancing Layout-to-Image Generation! Follow and star our repository to stay updated! โญ

datasets

None public yet