GEONTT
's Collections
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
•
2402.15627
•
Published
•
35
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
50
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper
•
2403.00522
•
Published
•
45
Stealing Part of a Production Language Model
Paper
•
2403.06634
•
Published
•
91
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
•
2403.06504
•
Published
•
53
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
Acceleration for Large Vision-Language Models
Paper
•
2403.06764
•
Published
•
26
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
65
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
•
2404.16710
•
Published
•
77
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
Paper
•
2406.18629
•
Published
•
42
Simulating Classroom Education with LLM-Empowered Agents
Paper
•
2406.19226
•
Published
•
31
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented
Generation
Paper
•
2406.19215
•
Published
•
30
Aligning Teacher with Student Preferences for Tailored Training Data
Generation
Paper
•
2406.19227
•
Published
•
25
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for
Memory-Efficient Embeddings
Paper
•
2406.19223
•
Published
•
9
Understand What LLM Needs: Dual Preference Alignment for
Retrieval-Augmented Generation
Paper
•
2406.18676
•
Published
•
6
Direct Preference Knowledge Distillation for Large Language Models
Paper
•
2406.19774
•
Published
•
22
We-Math: Does Your Large Multimodal Model Achieve Human-like
Mathematical Reasoning?
Paper
•
2407.01284
•
Published
•
76
Unveiling Encoder-Free Vision-Language Models
Paper
•
2406.11832
•
Published
•
51
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for
LLM Agents
Paper
•
2407.04363
•
Published
•
27
Human-like Episodic Memory for Infinite Context LLMs
Paper
•
2407.09450
•
Published
•
60
GAVEL: Generating Games Via Evolution and Language Models
Paper
•
2407.09388
•
Published
•
16
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Paper
•
2407.10969
•
Published
•
22
Better Alignment with Instruction Back-and-Forth Translation
Paper
•
2408.04614
•
Published
•
15
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
Large Language Models
Paper
•
2408.04840
•
Published
•
34