ChatRex: A Multimodal Large Language Model (MLLM) with a Decoupled Perception Design
Multimodal Large Language Models (MLLMs) have shown impressive capabilities in visual understanding. However, they face significant challenges in fine-grained perception tasks such as object detection, which is critical for applications…