Akarshan Biswas's picture

Akarshan Biswas

qnixsynapse

·

qnixsynapse

AI & ML interests

NLP, models, quantization

Recent Activity

new activity 4 days ago

google/gemma-2-9b-it:Tool calling support in Gemma 2

liked a Space 4 days ago

webml-community/attention-visualization

reacted to suayptalha's post with 👀 7 days ago

🚀 Introducing 𝐅𝐢𝐫𝐬𝐭 𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐦𝐢𝐧𝐆𝐑𝐔 𝐌𝐨𝐝𝐞𝐥𝐬 from the paper 𝐖𝐞𝐫𝐞 𝐑𝐍𝐍𝐬 𝐀𝐥𝐥 𝐖𝐞 𝐍𝐞𝐞𝐝𝐞𝐝? 🖥 I have integrated 𝐧𝐞𝐱𝐭-𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐑𝐍𝐍𝐬, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬" 𝐥𝐢𝐛𝐫𝐚𝐫𝐲 for both usage and training. 💻 I integrated two main tasks: 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 and 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐂𝐚𝐮𝐬𝐚𝐥𝐋𝐌. 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧: You can use this class for 𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset. 𝐌𝐢𝐧𝐆𝐑𝐔𝐅𝐨𝐫𝐂𝐚𝐮𝐬𝐚𝐥𝐋𝐌: You can use this class for 𝐂𝐚𝐮𝐬𝐚𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥 tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it! 🔗 𝐋𝐢𝐧𝐤𝐬: Models: https://huggingface.co/collections/suayptalha/mingru-676fe8d90760d01b7955d7ab GitHub: https://github.com/suayptalha/minGRU-hf LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1 📰 𝐂𝐫𝐞𝐝𝐢𝐭𝐬: Paper Link: https://arxiv.org/abs/2410.01201 I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.

View all activity

Organizations

None yet

qnixsynapse's activity

upvoted a paper about 1 month ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50

upvoted a paper 4 months ago

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16, 2024 • 39

upvoted a paper 5 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

upvoted an article 5 months ago

Article

Tool Use, Unified

Aug 12, 2024

• 70

upvoted 2 papers 5 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 37

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 110

upvoted a collection 5 months ago

Gemma 2 2B Release

The 2.6B parameter version of Gemma 2. • 6 items • Updated 24 days ago • 78

upvoted 3 papers 6 months ago

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 60

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Paper • 2407.03963 • Published Jul 4, 2024 • 15

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 23

upvoted a collection 7 months ago

SSMs

A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated about 13 hours ago • 26

upvoted 2 papers 8 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8, 2024 • 8

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 108

upvoted 2 papers 9 months ago

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22, 2024 • 6

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19, 2024 • 38

upvoted a collection 9 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated about 1 month ago • 698

upvoted 2 papers 9 months ago

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 43

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 52

upvoted an article 9 months ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9, 2024

• 99

upvoted a collection 9 months ago

Gemma release

Groups the Gemma models released by the Google team. • 40 items • Updated 24 days ago • 328