This helps protect our community. Learn more

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

6.76K subscribers

4.3K views 2 years ago

A coding tutorial on how to reverse-engineer a model trained to grok modular addition! I'm joined by Jess Smith in this replication of our paper, Progress Measures for Grokking via Mechanistic Interpretability. In this part, we train the model to perform modular addition, and see that it groks! Code: https://neelnanda.io/modular-addition... Part 2: https://neelnanda.io/modular-addition... Part 3: https://neelnanda.io/modular-addition... The paper: https://neelnanda.io/grokking Getting started in mechanistic interpretability: https://neelnanda.io/getting-started TransformerLens: https://github.com/neelnanda-io/Trans... Transformer tutorial: https://neelnanda.io/transformer-tuto... Original grokking paper: https://arxiv.org/abs/2201.02177 OUTLINE: 0:00 - Intro 0:52 - What even is grokking? 5:09 - Define the tasks 7:23 - Training data fraction rationale 9:46 - Define the model 14:41 - Define optimizer and loss function 17:51 - Training the model 19:30 - Discussion on model size and interpretability 23:46 - What even is mechanistic interpretability? 27:09 - Interlude on the slingshot mechanism 32:55 - The results and conclusion

...more

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

109Likes

4,379Views

2023Mar 12

Transcript

Follow along using the transcript.

Neel Nanda

6.76K subscribers

▶

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

Neel Nanda

A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3

Neel Nanda

A Walkthrough of Reverse-Engineering Modular Addition: Why does it grok? (Part 3/3)

Neel Nanda

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

Chapters View all

Intro

Intro

Intro

What even is grokking?

What even is grokking?

What even is grokking?

Define the tasks

Define the tasks

Define the tasks

Training data fraction rationale

Training data fraction rationale

Training data fraction rationale

Define the model

Define the model

Define the model

Define optimizer and loss function

Define optimizer and loss function

Define optimizer and loss function

Training the model

Training the model

Training the model

Discussion on model size and interpretability

Discussion on model size and interpretability

Discussion on model size and interpretability

Neel Nanda

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

Comments 16

Chapters

Intro

Intro

Intro

What even is grokking?

What even is grokking?

What even is grokking?

Define the tasks

Define the tasks

Define the tasks

Training data fraction rationale

Training data fraction rationale

Training data fraction rationale

Define the model

Define the model

Define the model

Define optimizer and loss function

Define optimizer and loss function

Define optimizer and loss function

Training the model

Training the model

Training the model

Discussion on model size and interpretability

Discussion on model size and interpretability

Discussion on model size and interpretability

What even is mechanistic interpretability?

What even is mechanistic interpretability?

What even is mechanistic interpretability?

Interlude on the slingshot mechanism

Interlude on the slingshot mechanism

Interlude on the slingshot mechanism

The results and conclusion

The results and conclusion

The results and conclusion

Description

Chapters View all

Neel Nanda

Transcript

Reverse-Engineering Modular Addition

Next:A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3

A Walkthrough of Reverse-Engineering Modular Addition: Why does it grok? (Part 3/3)

A Walkthrough of Reverse-Engineering Modular Addition: The Fourier Multiplication Algorithm Part 2/3

Chapters

Chapters