Compositional GSM: A New AI Benchmark for Evaluating Large Language Models’ Reasoning Capabilities in Multi-Step Problems
Natural language processing (NLP) has experienced rapid advancements, with large language models (LLMs) being used to tackle various challenging problems. Among the diverse applications of LLMs, mathematical problem-solving has emerged…