Vision-Language Integration for Enhanced Locomotion Mode Prediction