Continual Learning on Speech and Audio: Towards Data, Model and Metrics