SHOW-O: A Single Transformer Uniting Multimodal Understanding and Generation
Significant advancements in large language models (LLMs) have inspired the development of multimodal large language models (MLLMs). Early MLLM efforts, such as LLaVA, MiniGPT-4, and InstructBLIP, demonstrate notable multimodal understanding…