MedHELM: A Comprehensive Healthcare Benchmark to Evaluate Language Models on Real-World Clinical Tasks Using Real Electronic Health Records
Large Language Models (LLMs) are widely used in medicine, facilitating diagnostic decision-making, patient sorting, clinical reporting, and medical research workflows. Though they are exceedingly good in controlled medical testing, such…