Self supervised audio-visual learning

1 minutes

April 30, 2021

Under guidance of Prof. Preethi Jyothi and Prof. Ganesh Ramakrishnan

The thesis has revolved around multi-modal learning and how self-supervised objectives learn better embeddings.

During the thesis work, we had explored video-caption retrieval tasks, novel audio-visual video parsing task. We explain and critic various related works and propose new models for improvements.

Responsibilities :

Investigating various techniques to learn joint audio-visual-linguistic embedding for video-text retrieval
Inspecting new losses that can help improve the performance
Exploring various heuristics to form augmented supervision required for new loss for better ranking of videos given text query and vice versa
Implement retrieval task with different losses in MSRVTT, Charades and TFT dataset
Technologies: Python and PyTorch

Github Code

Paper Link

Slides

Hi, I'm Jayaprakash👋

This is my personal website

Self supervised audio-visual learning

Under guidance of Prof. Preethi Jyothi and Prof. Ganesh Ramakrishnan

Responsibilities :