Mech Interp Project Advising Call: Memorisation in GPT-2 Small

Feb 4

I've recently been having advising calls with REMIX teams (Redwood's interpretability sprint) trying to give advice & feedback on projects. As an experiment, I've published a recording of one advising call (with Tessa Barton & Kushal Jain on memorisation in GPT-2 Small), I'm curious whether this is useful to anyone! IMO getting detailed feedback from a more experienced research is one of the best ways to improve at research, but have no idea whether someone else's feedback is comparatively useful, or whether my advice is good enough lol. Thanks to the team for being down to publish this, and the work!

$\setCounter{0}$

Neel Nanda

Mech Interp Project Advising Call: Memorisation in GPT-2 Small

Attribution Patching: Activation Patching At Industrial Scale

Mechanistic Interpretability Quickstart Guide

Neel Nanda