Abstract: Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural ...
Official repository for the paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs". The encoder-free 3D LMM directly utilizes a token embedding module to convert point cloud data ...
T2I models aim to create images that accurately align with the text and showcase high perceptual quality. Therefore, the proposed A-Bench includes two parts to diagnose whether LMMs are masters at ...
Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant performance on generic video understanding in the form of VQA (Visual Question Answering), which mainly focuses on ...