RealHumanEval: A Web Interface to Measure the Ability of LLMs to Assist Programmers
The growing reliance on large language models for coding support poses a significant problem: how best to assess real-world impact on programmer productivity? Current approaches, such as static bench-marking based…