-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 27 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 189 -
gorilla-llm/APIBench
Updated • 133 • 66 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 21
Collections
Discover the best community collections!
Collections including paper arxiv:2311.12983
-
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 27 -
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Paper • 2311.07361 • Published • 13 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 189 -
teknium/openhermes
Viewer • Updated • 243k • 501 • 205
-
Efficient Streaming Language Models with Attention Sinks
Paper • 2309.17453 • Published • 13 -
Simple and Controllable Music Generation
Paper • 2306.05284 • Published • 149 -
FinGPT: Large Generative Models for a Small Language
Paper • 2311.05640 • Published • 28 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper • 2211.05100 • Published • 28 -
CsFEVER and CTKFacts: Acquiring Czech data for fact verification
Paper • 2201.11115 • Published -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 16 -
FinGPT: Large Generative Models for a Small Language
Paper • 2311.05640 • Published • 28
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper • 2309.08172 • Published • 11 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper • 2308.14306 • Published • 1
-
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Paper • 2310.15511 • Published • 5 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 26 -
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Paper • 2310.01557 • Published • 12 -
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Paper • 2310.03214 • Published • 18
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 76 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 42