Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.12195

Papers - Surge Global

OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11

Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11
SurgeGlobal/OpenBezoar-SFT

Text Generation • Updated Apr 26, 2024 • 33 • 3
SurgeGlobal/OpenBezoar-HH-RLHF-SFT

Text Generation • Updated Apr 27, 2024 • 33
SurgeGlobal/OpenBezoar-HH-RLHF-DPO

Text Generation • Updated Apr 27, 2024 • 39

Papers - Fine-tuning - SFT

InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 30
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 40
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 82
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11

Papers - Fine-tuning - Prompts

Toolformer: Language Models Can Teach Themselves to Use Tools

Paper • 2302.04761 • Published Feb 9, 2023 • 11
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 50
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 40
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Dataset Processing Technique

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 84
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Paper • 2404.02893 • Published Apr 3, 2024 • 20
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11, 2024 • 29
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11

Papers - Fine-tuning - LoRA

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 16
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

Paper • 2304.08247 • Published Apr 14, 2023 • 2
S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 28
WavLLM: Towards Robust and Adaptive Speech Large Language Model

Paper • 2404.00656 • Published Mar 31, 2024 • 10

Papers - Fine-tuning

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 16
SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 46
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 46
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11

Simple and Controllable Music Generation

Paper • 2306.05284 • Published Jun 8, 2023 • 146
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 11

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs