Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
25
Xargs Lynx
xargs01
Follow
21world's profile picture
1 follower
·
14 following
AI & ML interests
None yet
Recent Activity
liked
a model
1 day ago
tencent/Hunyuan3D-2
liked
a Space
1 day ago
ginipick/QR-Canvas
reacted
to
m-ric
's
post
with 👀
5 days ago
𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥 This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 🏗️ MoE with novel hybrid attention: ‣ Mixture of Experts with 456B total parameters (45.9B activated per token) ‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers 🏆 Outperforms leading models across benchmarks while offering vastly longer context: ‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks ‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) 🔬 Technical innovations enable efficient scaling: ‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half ‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) 🎯 Thorough training strategy: ‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! 📝 It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here 👉 https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users 👉 https://huggingface.co/MiniMaxAI/MiniMax-Text-01
View all activity
Organizations
None yet
xargs01
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
1 day ago
tencent/Hunyuan3D-2
Text-to-3D
•
Updated
1 day ago
•
1.1k
•
203
liked
a Space
1 day ago
Running
35
💻
QR Canvas
liked
a model
9 days ago
mradermacher/Phi-4-AbliteratedRP-i1-GGUF
Updated
11 days ago
•
2.18k
•
5
liked
a Space
about 1 month ago
Running
on
Zero
273
🚀
Video Dubbing
liked
3 models
about 1 month ago
mradermacher/MambaHermes-3B-i1-GGUF
Updated
Dec 20, 2024
•
91
•
1
mradermacher/Llama-3.2-3B-Instruct-abliterated-i1-GGUF
Updated
Nov 20, 2024
•
558
•
3
mradermacher/Llama3.2-3B-ShiningValiant2-i1-GGUF
Updated
Nov 18, 2024
•
198
•
2
liked
2 Spaces
about 1 month ago
Running
on
A10G
191
🏃
CharacterGen
Gradio demo of CharacterGen (SIGGRAPH 2024)
Running
519
👁
Edge TTS Text To Speech
liked
a model
about 2 months ago
OuteAI/OuteTTS-0.1-350M-GGUF
Text-to-Speech
•
Updated
Nov 27, 2024
•
210
•
34
liked
a Space
3 months ago
Running
4
🧠
Mistral Small 22B (2409)
Mistral Small 22B snapshot from Sep 2024
liked
a Space
about 1 year ago
Running
on
A10G
4.72k
🎵
MusicGen
liked
a model
about 1 year ago
facebook/musicgen-stereo-large
Text-to-Audio
•
Updated
Mar 6, 2024
•
1.2k
•
70
liked
a Space
about 1 year ago
Runtime error
516
📞
Seamless M4T v2
liked
2 models
over 1 year ago
lllyasviel/sd_control_collection
Updated
Sep 9, 2023
•
1.85k
dreamlike-art/dreamlike-anime-1.0
Text-to-Image
•
Updated
Mar 13, 2023
•
12.4k
•
247
liked
a model
almost 2 years ago
lllyasviel/ControlNet-v1-1
Updated
Apr 25, 2023
•
3.7k
liked
a Space
almost 2 years ago
Runtime error
447
🦙
Alpaca-LoRA
liked
a model
almost 2 years ago
Anashel/rpg
Text-to-Image
•
Updated
Sep 4, 2024
•
43
•
294
liked
a model
about 2 years ago
darkstorm2150/Protogen_x3.4_Official_Release
Text-to-Image
•
Updated
May 10, 2023
•
573
•
350
Load more