1 Huazhong University of Science & Technology, 2 Wuhan University. Image matching for both cross-view and cross-modality plays a critical role in multimodal perception. In practice, the modality gap ...