Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA Hopper GPUs
The rapid growth of large language models (LLMs) and their increasing computational requirements have prompted a pressing need for optimized solutions to manage memory usage and inference speed. As models…