Jia-Ying Lin

linekin
ยท

AI & ML interests

None yet

Recent Activity

liked a model about 4 hours ago
AdaptLLM/Adapt-MLLM-to-Domains
reacted to m-ric's post with ๐Ÿ‘ 12 days ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: ๐—ช๐—ฒ๐—น๐—ฐ๐—ผ๐—บ๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—ฟ๐—ป๐—•๐—˜๐—ฅ๐—ง! ๐Ÿค— We talk a lot about โœจGenerative AIโœจ, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. โžก๏ธ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. ๐—ง๐—Ÿ;๐——๐—ฅ: ๐Ÿ›๏ธ Architecture changes: โ‡’ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 โœจ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. ๐Ÿฅ‡ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post ๐Ÿ‘‰ https://huggingface.co/blog/modernbert
View all activity

Organizations

None yet

linekin's activity

reacted to m-ric's post with ๐Ÿ‘ 12 days ago
view post
Post
2322
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: ๐—ช๐—ฒ๐—น๐—ฐ๐—ผ๐—บ๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—ฟ๐—ป๐—•๐—˜๐—ฅ๐—ง! ๐Ÿค—

We talk a lot about โœจGenerative AIโœจ, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

โžก๏ธ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

๐—ง๐—Ÿ;๐——๐—ฅ:
๐Ÿ›๏ธ Architecture changes:
โ‡’ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
โœจ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

๐Ÿฅ‡ As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post ๐Ÿ‘‰ https://huggingface.co/blog/modernbert
  • 1 reply
ยท
reacted to m-ric's post with โค๏ธ 12 days ago
view post
Post
5004
Since I published it on GitHub a few days ago,
Hugging Face's new agentic library ๐˜€๐—บ๐—ผ๐—น๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ has gathered nearly 4k stars ๐Ÿคฏ

โžก๏ธ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. โœจ

Sounds like something you'd like to do? Apply here ๐Ÿ‘‰ https://apply.workable.com/huggingface/j/AF1D4E3FEB/
ยท