Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Tudor M
tudorizer
Follow
ijohn07's profile picture
1 follower
Β·
20 following
https://paragraph.xyz/@tudorizer
tudorizer
tudormunteanu
tudorm
tudorizer.bsky.social
AI & ML interests
Hardware, GPUs, renewable energy; enverge.ai
Recent Activity
upvoted
an
article
7 days ago
Energy Scores for AI Models
reacted
to
m-ric
's
post
with π₯
15 days ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: πͺπ²πΉπ°πΌπΊπ² π πΌπ±π²πΏπ»πππ₯π§! π€ We talk a lot about β¨Generative AIβ¨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. β‘οΈ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. π§π;ππ₯: ποΈ Architecture changes: β First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 β¨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. π₯ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post π https://huggingface.co/blog/modernbert
updated
a Space
about 2 months ago
tudorizer/tiny-coder
View all activity
Organizations
spaces
1
Running
1
π
Tiny Coder
Hold me closer, tiny coder!
models
1
tudorizer/tiny-training
Updated
Nov 18, 2024
β’
5
datasets
None public yet