Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
2
Ankit Kumar
ankitprasad
Follow
AI & ML interests
None yet
Recent Activity
updated
a model
2 days ago
ankitprasad/small_gpt2_model
published
a model
2 days ago
ankitprasad/small_gpt2_model
reacted
to
singhsidhukuldeep
's
post
with ❤️
7 days ago
Exciting Research Alert: Revolutionizing Long-Context Language Models! A groundbreaking paper from researchers at University of Edinburgh and Apple introduces ICR² (In-context Retrieval and Reasoning), addressing a critical challenge in long-context language models (LCLMs). Key Innovations: - A novel benchmark that realistically evaluates LCLMs' ability to process and reason with extended contexts - Three innovative approaches that significantly improve LCLM performance: - Retrieve-then-generate fine-tuning - Retrieval-attention probing - Joint retrieval head training The most impressive result? Their best approach, implemented on Mistral-7B with just 32K token limit, achieves performance comparable to GPT-4 while using significantly fewer parameters. Technical Deep Dive: The team's approach leverages attention head mechanisms to filter and denoise long contexts during decoding. Their retrieve-then-generate method implements a two-step process where the model first identifies relevant passages before generating responses. The architecture includes dedicated retrieval heads working alongside generation heads, enabling joint optimization during training. What sets this apart is their innovative use of the Gumbel-TopK trick for differentiable retrieval and their sophisticated attention probing mechanism that identifies and utilizes retrieval-focused attention heads. Impact: This research fundamentally changes how we approach long-context processing in LLMs, offering a more efficient alternative to traditional RAG pipelines while maintaining high performance.
View all activity
Organizations
None yet
models
2
Sort: Recently updated
ankitprasad/small_gpt2_model
Text Generation
•
Updated
2 days ago
•
14
ankitprasad/spacy-resume-ner
Updated
30 days ago
datasets
None public yet