-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 183 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 67 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 62 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2401.00368
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 104 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 40 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 19 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 36
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 37 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 104 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Paper • 2403.13447 • Published • 18 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 61
-
Attention Is All You Need
Paper • 1706.03762 • Published • 50 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 44
-
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
Metadata Might Make Language Models Better
Paper • 2211.10086 • Published • 4 -
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Paper • 2310.03686 • Published • 3
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 29 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 66