Mechanistic interpretability seeks to reverse-engineer neural networks by mapping low-level components (neurons, attention heads, etc.) to…Continue reading on Medium »