I’m currently an independent researcher working on mechanistic interpretability of large language models. Recently, I finished the SERI MATS research program under the mentorship of Neel Nanda. Before that, I did research in adversarial machine learning (on defenses against adversarial examples and data poisoning) and theoretical computer science (on spectral graph theory).

I got my PhD in computer science from MIT, where I was advised by Prof. Aleksander Mądry. Before starting at MIT, I did Part III of the Mathematical Tripos at Cambridge University, with coursework in combinatorics and algebra. And before that, I earned a BA, with a joint concentration in math and computer science, at Harvard College, where I worked with Prof. Salil Vadhan.

I’m also broadly interested in the research, design and implementation of tools that make the work of scientists and practitioners in computational fields easier, and their code simpler and more maintainable. As part of this, I’m currently working on mandala, a Python library to simplify scientific data management.

CV | Google Scholar

Publications

Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
A. Makelov*, G. Lange*, N. Nanda
ICLR 2024

Backdoor or Feature? A New Perspective on Data Poisoning
A. Khaddaj*, G. Leclerc*, A. Makelov*, K. Georgiev, A. Ilyas, H. Salman, A. Mądry
ICML 2023

Towards Deep Learning Models Resistant to Adversarial Attacks
A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu.
ICLR 2018

Expansion in Lifts of Graphs
A. Makelov
Undergraduate Thesis, Harvard College 2015

Blog

Mandala: Python programs that save, query and version themselves (April ‘23)
Practical dependency tracking for Python function calls (June ‘23)