WorFBench: A Benchmark for Evaluating Complex Workflow Generation in Large Language Model Agents
Large Language Models (LLMs) have shown remarkable potential in solving complex real-world problems, from function calls to embodied planning and code generation. A critical capability for LLM agents is decomposing…