Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2301.13688

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 12

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26, 2024 • 43
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 53
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 15
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Paper • 2308.13259 • Published Aug 25, 2023 • 2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Paper • 2309.05653 • Published Sep 11, 2023 • 10
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 19

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 4
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Paper • 2309.09582 • Published Sep 18, 2023 • 4
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Paper • 2310.13127 • Published Oct 19, 2023 • 11
Evaluating the Robustness to Instructions of Large Language Models

Paper • 2308.14306 • Published Aug 28, 2023 • 1

Flan-T5 release

The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling

google/flan-t5-small

Text2Text Generation • Updated Oct 10, 2023 • 537k • 305
google/flan-t5-base

Text2Text Generation • Updated Jul 17, 2023 • 703k • 831
google/flan-t5-large

Text2Text Generation • Updated Jul 17, 2023 • 1.62M • • 665
google/flan-t5-xxl

Text2Text Generation • Updated Jul 27, 2023 • 642k • 1.23k

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 34
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 31
Agents: An Open-source Framework for Autonomous Language Agents

Paper • 2309.07870 • Published Sep 14, 2023 • 42
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs