Alex Makelov's picture

Alex Makelov

amakelov

https://amakelov.github.io

AI & ML interests

Interpretability

Recent Activity

authored a paper about 2 months ago

Towards Deep Learning Models Resistant to Adversarial Attacks

authored a paper 6 months ago

Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching

authored a paper 6 months ago

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

View all activity

Organizations

None yet

Papers 3

arxiv:2405.08366

arxiv:2311.17030

arxiv:1706.06083

models

None public yet

datasets

None public yet