Redwood Research

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

FabienRoger authored a paper about 1 month ago

Alignment faking in large language models

bshlgrs authored a paper about 1 month ago

Alignment faking in large language models

FabienRoger updated a model 7 months ago

redwoodresearch/llama-3-8b-cb

View all activity

redwoodresearch's activity

FabienRoger

authored a paper about 1 month ago

Alignment faking in large language models

Paper • 2412.14093 • Published Dec 18, 2024 • 7

bshlgrs

authored a paper about 1 month ago

Alignment faking in large language models

Paper • 2412.14093 • Published Dec 18, 2024 • 7

FabienRoger

updated a model 7 months ago

redwoodresearch/llama-3-8b-cb

Text Generation • Updated Jul 9, 2024 • 3

FabienRoger

updated 2 datasets 7 months ago

redwoodresearch/tiny_questions

Viewer • Updated Jul 3, 2024 • 94.7k • 37

redwoodresearch/tiny_question_assistant

Viewer • Updated Jul 1, 2024 • 79.4k • 34

FabienRoger

updated a model 7 months ago

redwoodresearch/mtd_func_correct_codegen2b_untrusted

Text Generation • Updated Jun 24, 2024 • 3

FabienRoger

updated a model 8 months ago

redwoodresearch/math_pwd_lock_deepseek_math7b_on_weak_pythia1b

Text Generation • Updated Jun 6, 2024 • 5

FabienRoger

updated a dataset 9 months ago

redwoodresearch/math_generations

Viewer • Updated Apr 29, 2024 • 25k • 35

FabienRoger

updated 2 datasets 11 months ago

redwoodresearch/wmdp-cyber-deduped

Viewer • Updated Mar 14, 2024 • 630 • 41

redwoodresearch/history-mcq

Viewer • Updated Mar 14, 2024 • 791 • 42

bshlgrs

authored 3 papers about 1 year ago

AI Control: Improving Safety Despite Intentional Subversion

Paper • 2312.06942 • Published Dec 12, 2023

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Paper • 2211.00593 • Published Nov 1, 2022 • 2

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 27

ksachan

authored a paper about 1 year ago

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 27

FabienRoger

updated 6 datasets over 1 year ago

AI & ML interests

Recent Activity

Team members 5

redwoodresearch's activity