Abstract: Vision-language foundation models (VLMs) have shown great potential in feature transfer and generalization across a wide spectrum of medical-related downstream tasks. However, fine-tuning ...
Abstract: Vision-language models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage the potential of VLMs in adapting to downstream tasks, context ...