Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.17631

Foundation AI Papers

Curated List of Must-Reads on LLM reasoning at Temus AI team

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 104
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 114

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 2
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 24
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

Paper • 2310.15511 • Published Oct 24, 2023 • 4
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 25
SmartPlay : A Benchmark for LLMs as Intelligent Agents

Paper • 2310.01557 • Published Oct 2, 2023 • 12
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Paper • 2310.03214 • Published Oct 5, 2023 • 18

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 37
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 13
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 30
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 47

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 32
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 33
BAAI/JudgeLM-7B-v1.0

Text Generation • Updated Oct 27, 2023 • 415 • 14

Tailor-made LLM evaluations: custom evaluations for your LLM

Collection of articles and resources focusing on automatic evaluation for LLM's and their role as unbiased judges in assessing other LLMs' outputs

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 32
Generative Judge for Evaluating Alignment

Paper • 2310.05470 • Published Oct 9, 2023 • 1
Humans or LLMs as the Judge? A Study on Judgement Biases

Paper • 2402.10669 • Published Feb 16, 2024
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 33

Papers: LLM as a Judge

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 33
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 32
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Paper • 2303.16634 • Published Mar 29, 2023 • 3
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 53

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 26
On the Societal Impact of Open Foundation Models

Paper • 2403.07918 • Published Feb 27, 2024 • 16
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 33
Instruction Tuning for Large Language Models: A Survey

Paper • 2308.10792 • Published Aug 21, 2023 • 1

Evaluation of LLM agents paper

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

Paper • 2401.15391 • Published Jan 27, 2024 • 6
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 24
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 33
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 53

Large Language Model Alignment: A Survey

Paper • 2309.15025 • Published Sep 26, 2023 • 2
Aligning Large Language Models with Human: A Survey

Paper • 2307.12966 • Published Jul 24, 2023 • 1
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 50
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Paper • 2310.05344 • Published Oct 9, 2023 • 1

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs