I’m currently an independent researcher working on mechanistic interpretability of large language models. Recently, I finished the SERI MATS research program under the mentorship of Neel Nanda. Before that, I did research in adversarial machine learning (on defenses against adversarial examples and data poisoning) and theoretical computer science (on spectral graph theory).

I got my PhD in computer science from MIT, where I was advised by Prof. Aleksander Mądry. Before starting at MIT, I did Part III of the Mathematical Tripos at Cambridge University, with coursework in combinatorics and algebra. And before that, I earned a BA, with a joint concentration in math and computer science, at Harvard College, where I worked with Prof. Salil Vadhan.

I’m also broadly interested in the research, design and implementation of tools that make the work of scientists and practitioners in computational fields easier, and their code simpler and more maintainable. As part of this, I’m currently working on mandala, a Python library to simplify scientific data management.

CV | Google Scholar | Semantic Scholar | Twitter

Publications

Sparse Autoencoders Match Supervised Features for Model Steering on the IOI Task
A. Makelov
Spotlight, ICML 2024 Workshop on Mechanistic Interpretability
See also: AI Alignment Forum post

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
A. Makelov*, G. Lange*, N. Nanda
SeT-LLM Workshop @ ICLR 2024

mandala: Compositional Memoization for Simple & Powerful Scientific Data Management
A. Makelov
SciPy 2024 Proceedings (to appear)

Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
A. Makelov*, G. Lange*, N. Nanda
ICLR 2024

Backdoor or Feature? A New Perspective on Data Poisoning
A. Khaddaj*, G. Leclerc*, A. Makelov*, K. Georgiev, A. Ilyas, H. Salman, A. Mądry
ICML 2023

Towards Deep Learning Models Resistant to Adversarial Attacks
A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu.
ICLR 2018

Expansion in Lifts of Graphs
A. Makelov
Undergraduate Thesis, Harvard College 2015

Personal Blog

Practical dependency tracking for Python function calls (June ‘23)
Mandala: Python programs that save, query and version themselves (April ‘23)