I’m currently an independent researcher working on mechanistic interpretability of large language models. Recently, I finished the SERI MATS research program under the mentorship of Neel Nanda. Before that, I did research in adversarial machine learning (on defenses against adversarial examples and data poisoning) and theoretical computer science (on spectral graph theory).
I got my PhD in computer science from MIT, where I was advised by Prof. Aleksander Mądry. Before starting at MIT, I did Part III of the Mathematical Tripos at Cambridge University, with coursework in combinatorics and algebra. And before that, I earned a BA, with a joint concentration in math and computer science, at Harvard College, where I worked with Prof. Salil Vadhan.
I’m also broadly interested in the research, design and implementation of tools that make the work of scientists and practitioners in computational fields easier, and their code simpler and more maintainable. As part of this, I’m currently working on mandala, a Python library to simplify scientific data management.
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
A. Makelov*, G. Lange*, N. Nanda
Backdoor or Feature? A New Perspective on Data Poisoning
A. Khaddaj*, G. Leclerc*, A. Makelov*, K. Georgiev, A. Ilyas, H. Salman, A. Mądry
Towards Deep Learning Models Resistant to Adversarial
A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu.
Expansion in Lifts of
Undergraduate Thesis, Harvard College 2015