Yingfa Chen's picture

4 4 12

Yingfa Chen

chen-yingfa

·

https://chen-yingfa.github.io

AI & ML interests

Long-context modeling, continual learning, architectures

Recent Activity

updated a dataset about 1 month ago

chen-yingfa/CFDBench-raw

upvoted a paper 2 months ago

MARS: Unleashing the Power of Variance Reduction for Training Large Models

authored a paper 3 months ago

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

View all activity

Organizations

None yet

chen-yingfa's activity

updated a dataset about 1 month ago

chen-yingfa/CFDBench-raw

Viewer • Updated Dec 12, 2024 • 5.13B • 52

upvoted a paper 2 months ago

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Paper • 2411.10438 • Published Nov 15, 2024 • 13

authored 2 papers 3 months ago

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

Paper • 2411.02335 • Published Nov 4, 2024 • 11

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9, 2024 • 2

upvoted a paper 3 months ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9, 2024 • 2

commented a paper 3 months ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9, 2024 • 2 •

updated 2 datasets 5 months ago

chen-yingfa/CFDBench

Updated Sep 4, 2024 • 66 • 1

chen-yingfa/CHUBS

Viewer • Updated Aug 20, 2024 • 2.22k • 60

upvoted an article 5 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 57

authored 4 papers 5 months ago

CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics

Paper • 2310.05963 • Published Sep 13, 2023

Robust and Scalable Model Editing for Large Language Models

Paper • 2403.17431 • Published Mar 26, 2024

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

Paper • 2402.13718 • Published Feb 21, 2024 • 1

Sub-Character Tokenization for Chinese Pretrained Language Models

Paper • 2106.00400 • Published Jun 1, 2021

upvoted a paper 7 months ago

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Paper • 2406.15718 • Published Jun 22, 2024 • 14

authored a paper 7 months ago

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Paper • 2406.15718 • Published Jun 22, 2024 • 14

New activity in fla-hub/rwkv6-7B-finch 8 months ago

Can you add some details about this model

#1 opened 8 months ago by

New activity in xiaol/RWKV-v5-12B-one-state-chat-16k 8 months ago

Please can you provide a example of use the model weights?

#1 opened about 1 year ago by

liked a model 8 months ago

xiaol/RWKV-5-world-v2-7B-0.4-300k

Updated Nov 11, 2023 • 10

reacted to akhaliq's post with ❤️ 9 months ago

Post

4413

Leave No Context Behind

Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention (2404.07143)

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

liked a dataset 9 months ago

togethercomputer/RedPajama-Data-1T-Sample

Viewer • Updated Jul 19, 2023 • 850k • 9.48k • 124