Investigating self-supervised architectures for learning speech representations
Under guidance of Prof. Ganesh Ramakrishnan
In this project I along with my research group have explored various audio encoders.
The findings of the project has helped improve state-of-the-art results on multi-modal video captioning.
Responsibilities :
- Explored various pre-training techniques for learning audio embeddings.
- Used PASE architecture to better handle low resource setting with rich audio features
- Work got accepted in Interspeech 2020
- Technologies: Python and PyTorch