Investigating self-supervised architectures for learning speech representations

1 minutes

May 31, 2020

In this project I along with my research group have explored various audio encoders.

The findings of the project has helped improve state-of-the-art results on multi-modal video captioning.

Explored various pre-training techniques for learning audio embeddings.
Used PASE architecture to better handle low resource setting with rich audio features
Work got accepted in Interspeech 2020
Technologies: Python and PyTorch

Hi, I'm Jayaprakash👋