Björn Bebensee's picture

Björn Bebensee

bebensee

https://bebens.ee

AI & ML interests

Large language model pre-training, tool augmentation, conversational AI

Recent Activity

liked a dataset about 2 months ago

OpenCoder-LLM/opc-fineweb-math-corpus

View all activity

Organizations

None yet

bebensee's activity

upvoted a paper 5 months ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 41

upvoted 3 papers 6 months ago

Vision language models are blind

Paper • 2407.06581 • Published Jul 9, 2024 • 83

On Leakage of Code Generation Evaluation Datasets

Paper • 2407.07565 • Published Jul 10, 2024 • 5

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 68

upvoted 2 papers 10 months ago

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Paper • 2403.04746 • Published Mar 7, 2024 • 22

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 605

upvoted 3 papers 11 months ago

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Paper • 2402.10466 • Published Feb 16, 2024 • 17

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

Paper • 2401.17377 • Published Jan 30, 2024 • 35

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 48

upvoted 2 papers 12 months ago

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19, 2024 • 54

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 26

upvoted 4 papers about 1 year ago

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 57

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 50

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 29

upvoted 5 papers over 1 year ago

Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 19

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 84

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37

Uncovering mesa-optimization algorithms in Transformers

Paper • 2309.05858 • Published Sep 11, 2023 • 12

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Paper • 2309.05793 • Published Sep 11, 2023 • 50