Evaluating the Vulnerabilities of Unlearning Techniques in Large Language Models: A Comprehensive White-Box Analysis
Large language models (LLMs) have gained immense capabilities due to their training on vast internet-based datasets. However, this broad exposure has inadvertently incorporated harmful content, enabling LLMs to generate toxic,…