Learning To Align Multimodal Data For Static And Dynamic Tasks