Speaker
Description
Vision foundation models, which have demonstrated significant potential in many multimedia applications, are often underutilized in the natural sciences. This is primarily due to mismatches between the nature of domain-specific scientific data and the typical training data used for foundation models, leading to distribution shifts. Scientific data often differ substantially in structure and characteristics; researchers frequently face the challenge of optimizing model performance with limited labeled data of only a few hundred or thousand images. To adapt foundation models effectively requires customized approaches in preprocessing, data augmentation, and training techniques. Additionally, each vision foundation model exhibits unique strengths and limitations, influenced by differences in architecture, training procedures, and the datasets used for training. In this work, we evaluate the application of various vision foundation models to astrophysics data, specifically images from optical and radio astronomy. Our results show that using features extracted by specific foundation models improves the classification accuracy of optical galaxy images compared to conventional supervised training. Similarly, these models achieve equivalent or better performance in object detection tasks with radio images. However, their performance in classifying radio galaxy images is generally poor and often inferior to traditional supervised training results. These findings suggest that selecting suitable vision foundation models for astrophysics applications requires careful consideration of the model characteristics and alignment with the specific requirements of the downstream tasks.