Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.11157

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 17 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 22
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 82
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 145
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Papers for finetuning

LLaMA Pro: Progressive LLaMA with Block Expansion

Paper • 2401.02415 • Published Jan 4, 2024 • 53
Datasheets for Datasets

Paper • 1803.09010 • Published Mar 23, 2018 • 2
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 19
PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Paper • 2310.17752 • Published Oct 26, 2023 • 12

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published May 18, 2024 • 27

Cognitive Architecture

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published May 18, 2024 • 27
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17, 2024 • 14
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4, 2024 • 35
OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 78

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 6

COT & Reasoning Preference optimisation Papers

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30, 2024 • 47
Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published May 18, 2024 • 27
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published May 18, 2024 • 12
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19, 2024 • 53

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 12
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 605
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 23
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 114

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Paper • 2310.17752 • Published Oct 26, 2023 • 12
S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 28
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Paper • 2311.06243 • Published Nov 10, 2023 • 17
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 28

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 26
A Zero-Shot Language Agent for Computer Control with Structured Reflection

Paper • 2310.08740 • Published Oct 12, 2023 • 14
Personality Traits in Large Language Models

Paper • 2307.00184 • Published Jul 1, 2023 • 20
An Emulator for Fine-Tuning Large Language Models using Small Language Models

Paper • 2310.12962 • Published Oct 19, 2023 • 14

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4, 2024 • 36
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 44
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 21

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs