Mechanistic Unlearning: A New AI Method that Uses Mechanistic Interpretability to Localize and Edit Specific Model Components Associated with Factual Recall Mechanisms
Large language models (LLMs) sometimes learn the things that we don’t want them to learn and understand knowledge. It’s important to find ways to remove or adjust this knowledge to…