About Me
Hi, I’m Neel! I run the Google DeepMind mechanistic interpretability team, our job is to take a trained neural network and try to reverse engineer the algorithms and structures it has learned. If you want to learn more about the field, see my appearance on the Machine Learning Street Talk podcast.
I see the main goal of my work as reducing existential risk from AI, and I consider myself part of the Effective Altruism and rationality communities. Prior to this, I did independent mechanistic interpretability research, and I worked at Anthropic as a language model interpretability researcher under Chris Olah. You can see my papers here.
The main way I currently mentor people is via my MATS stream, a full-time research program that happens twice a year (over the summer and over the winter). You can read more about the process and how to apply here.
Before all that, I did a pure maths undergrad at Cambridge (graduated in 2020), interned in quant finance roles (Jane Street and Jump Trading), before deciding that it wasn’t for me and taking the year after graduating to explore AI Safety and figure out what was going on in that space (interning at the Future of Humanity Institute, DeepMind and the Centre for Human-Compatible AI). After that year, I decided that existential risk from powerful AI is one of the most important problems of this century, and one worth spending my career trying to help with.
If you have thoughts on anything I’ve written, or otherwise want to contact me, you can email me at neelnanda27@gmail.com. (Though I don’t have capacity to respond to everyone who reaches out, so apologies in advance!)
About my blog
This blog is a collection of my thoughts on various ideas I’ve found valuable for being happy, improving my life, or understanding the world. (You can see my more technical blog posts in the mechanistic interpretability section, which is what I mostly write about nowadays) A lot of them focus on self-improvement and rationality, but also cover topics such as emotions, friendships, social skills, teaching, agency, motivation, achieving goals and altruism. See the Top Posts page to get an idea of where to start. I blog according to my internal sense of fun and whimsy, and accordingly don’t blog on any fixed schedule. Depending on how busy I am, I will sometimes have bursts of many posts and long breaks in between. You can subscribe to hear about new posts here.
I started this blog as an exercise in being less of a bloody perfectionist, so each post is deliberately written as a rough first draft, with minimal editing. Accordingly, these are highly, and deliberately not quality controlled! Please don’t take these posts as a perfect representation of what I believe, or the best representations of these ideas - I’m cutting a lot of the nuance and caveats I’d give in a proper treatment! But I feel very happy with how some of them have come out, and I hope my unfiltered ramblings are interesting and useful to you! You can see my retrospective on this experiment here (and my case for why you should start a blog yourself!)
If you have any feedback, positive or negative, to help me become a better writer, I’d really appreciate hearing about it! I have a feedback form here (and talk about why feedback is awesome in this post).
Stuff I’ve made:
The TransformerLens library for mechanistic interpretability of language models
A Comprehensive Mechanistic Interpretability Explainer - a searchable and detailed glossary of jargon in mech interp
A Youtube channel about mechanistic interpretability, with a bunch of paper walkthroughs, and live walkthroughs of research
Podcasts
Future of Life Podcast about Mechanistic Interpretability (3 parter)
Machine Learning Street Talk about Mechanistic Interpretability
How can we optimise for a meaningful life? - An introduction to Effective Altruism (Not Overthinking, hosted by Ali Abdaal and Taimur Abdaal)
An interview on habits & planning (Hear This Idea, hosted by Fin Moorhouse and Luca Righetti)
Older Stuff I’ve Made
3 hours of talks on machine learning intuitions
A 2 hour introductory talk to Reinforcement Learning
10 hours of revision lectures on Linear Algebra, Groups, Rings & Modules and Complex Analysis, focusing on a high-level overview, motivations and intuitions
Rationality workshops
Notes & lesson plans for 4 workshops based on Centre for Applied Rationality classes - on good planning, forming useful habits, having productive disagreements, and building systems
Writeup on this project, impact assessment, and thoughts on teaching rationality
Collecting student reviews for Cambridge maths courses (almost at 1000 reviews!)
Notes, taken for the Cambridge maths undergrad
Comprehensive intuition-focused notes for most second year courses
Covers uniform convergence, normed spaces, formal multivariable differentiation and the inverse function theorem
Intuition-focused notes for a handful of third year courses
Linear Analysis - Introductory functional analysis
Number Fields - Introductory Algebraic Number Theory
Probability & Measure - Introductory Measure Theory
Logic and Set Theory - Introduction to formal logic, ordinals and cardinals
Rough, but hopefully still useful
Maths of Machine Learning - Introduction to Statistical Learning Theory
Anki flashcards for second year maths courses