Post 50: Good Research Takes are Not Sufficient for Good Strategic Takes

Mar 22

TL;DR Having a good research track record is some evidence of good big-picture takes about AGI, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically.

Introduction

I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these way too seriously.

These are great questions, and I'm happy to try answering them. But I've noticed a bit of a pathology: people seem to assume that because I'm (hopefully!) good at the research, I'm automatically well-qualified to answer these broader strategic questions. I think this is a mistake, a form of undue deference that is both incorrect and unhelpful. I certainly try to have good strategic takes, and I think this makes me better at my job, but this is far from sufficient. Being good at research and being good at high level strategic thinking are just fairly different skillsets!

But isn’t someone being good at research strong evidence they’re also good at strategic thinking? I personally think it’s moderate evidence, but far from sufficient. One key factor is that a very hard part of strategic thinking is the lack of feedback. Your reasoning about confusing long-term factors need to extrapolate from past trends and make analogies from things you do understand better, and it can be quite hard to tell if what you're saying is complete bullshit or not. In an empirical science like mechanistic interpretability, however, you can get a lot more feedback. I think there's a certain kind of researcher who thrives in environments where they can get lots of feedback, but has much worse performance in domains without, where they e.g. form bad takes about the strategic picture and just never correct them because there's never enough evidence to convince them otherwise. It's just a much harder and rarer skill set to be good at something in the absence of good feedback.

Having good strategic takes is hard, especially in a field as complex and uncertain as AGI Safety. It requires clear thinking about deeply conceptual issues, in a space where there are many confident yet contradictory takes, and a lot of superficially compelling yet simplistic models. So what does it take?

Factors of Good Strategic Takes

As discussed above, ability to think clearly about thorny issues is crucial, and is a rare skill that is only somewhat used in empirical research. Lots of research projects I do feel more like plucking the low hanging fruit. I do think someone doing ground-breaking research is better evidence here, like Chris Olah’s original circuits work, especially if done multiple times (once could just be luck!). Though even then, it's evidence of the ability to correctly pursue ambitious research goals, but not necessarily to identify which ones will actually matter come AGI.

Domain knowledge of the research area is important. However, the key thing is not necessarily deep technical knowledge, but rather enough competence to tell when you're saying something deeply confused. Or at the very least, enough ready access to experts that you can calibrate yourself. You also need some sense of what the technique is likely to eventually be capable of and what limitations it will face.

But you don't necessarily need deep knowledge of all the recent papers so you can combine all the latest tricks. Being good at writing inference code efficiently or iterating quickly in a Colab notebook—these skills are crucial to research but just aren't that relevant to strategic thinking, except insofar as they potentially build intuitions.

Time spent thinking about the issue definitely helps, and correlates with research experience. Having my day job be hanging out with other people who think about the AGI safety problem is super useful. Though note that people's opinions are often substantially reflections of the people they speak to most, rather than what’s actually true.

It’s also useful to just know what people in the field believe, so I can present an aggregate view - this is something where deferring to experienced researchers makes sense.

I think there's also diverse domain expertise that's needed for good strategic takes that isn't needed for good research takes, and most researchers (including me) haven't been selected for having, e.g.:

A good understanding of what the capabilities and psychology of future AI will look like
Economic and political situations likely to surround AI development - e.g. will there be a Manhattan project for AGI?
What kind of solutions are likely to be implemented by labs and governments – e.g. how much willingness will there be to pay an alignment tax?
The economic situation determining which labs are likely to get there first
Whether it's sensible to reason about AGI in terms of who gets there first, or as a staggered multi-polar thing where there's no singular "this person has reached AGI and it's all over" moment
The comparative likelihood for x-risk to come from loss of control, misuse, accidents, structural risks, all of the above, something we’re totally missing, etc.
And many, many more

Conclusion

Having good strategic takes is important, and I think that researchers, especially those in research leadership positions, should spend a fair amount of time trying to cultivate them, and I’m trying to do this myself. But regardless of the amount of effort, there is a certain amount of skill required to be good at this, and people vary a lot in this skill.

Going forwards, if you hear someone's take about the strategic picture, please ask yourself, "What evidence do I have that this person is actually good at the skill of strategic takes?" And don't just equivocate this with them having written some impressive papers!

Practically, I recommend just trying to learn about lots of people's views, aim for deep and nuanced understanding of them (to the point that you can argue them coherently to someone else), and trying to reach some kind of overall aggregated perspective. Trying to form your own views can also be valuable, though I think also somewhat overrated.

Thanks to Jemima Jones for poking me to agency and write a blog post for the first time in forever.

$\setCounter{0}$

Neel Nanda

Post 50: Good Research Takes are Not Sufficient for Good Strategic Takes

Introduction

Factors of Good Strategic Takes

Conclusion

Post 51: Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice

Interlude: A Mechanistic Interpretability Analysis of Grokking

Neel Nanda