Sign in to confirm you’re not a bot
This helps protect our community. Learn more
A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)
109Likes
4,379Views
2023Mar 12
A coding tutorial on how to reverse-engineer a model trained to grok modular addition! I'm joined by Jess Smith in this replication of our paper, Progress Measures for Grokking via Mechanistic Interpretability. In this part, we train the model to perform modular addition, and see that it groks! Code: https://neelnanda.io/modular-addition... Part 2: https://neelnanda.io/modular-addition... Part 3: https://neelnanda.io/modular-addition... The paper: https://neelnanda.io/grokking Getting started in mechanistic interpretability: https://neelnanda.io/getting-started TransformerLens: https://github.com/neelnanda-io/Trans... Transformer tutorial: https://neelnanda.io/transformer-tuto... Original grokking paper: https://arxiv.org/abs/2201.02177 OUTLINE: 0:00 - Intro 0:52 - What even is grokking? 5:09 - Define the tasks 7:23 - Training data fraction rationale 9:46 - Define the model 14:41 - Define optimizer and loss function 17:51 - Training the model 19:30 - Discussion on model size and interpretability 23:46 - What even is mechanistic interpretability? 27:09 - Interlude on the slingshot mechanism 32:55 - The results and conclusion

Follow along using the transcript.

Neel Nanda

6.76K subscribers

Reverse-Engineering Modular Addition

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

Neel Nanda
2

A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3

Neel Nanda
3

A Walkthrough of Reverse-Engineering Modular Addition: Why does it grok? (Part 3/3)

Neel Nanda