view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Jul 23, 2024 • 225
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31, 2024 • 64
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30, 2024 • 22
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published May 30, 2024 • 31
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents Paper • 2310.19923 • Published Oct 30, 2023 • 14
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings Paper • 2402.17016 • Published Feb 26, 2024 • 5
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20, 2024 • 46
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7, 2024 • 42
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon Paper • 2401.03462 • Published Jan 7, 2024 • 27