Posture Guided Human Action Recognition for Fitness Applications
ICVGIP 2022
Brief Abstract: Multi-stage deep learning based method for action recognition to predict upright as well as non-upright actions with high accuracy
Cross Lingual Video and Text Retrieval: A New Benchmark Dataset and Algorithm
ACM ICMI 2021
Brief Abstract: Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input.
Cross-Modal learning for Audio-Visual Video Parsing
Interspeech 2021
Brief Abstract: In this paper, we present a novel approach to the audio-visual video parsing (AVVP) task that demarcates events from a video separately for audio and visual modalities.
Caption Alignment for Low Resource Audio-Visual Data
Interspeech 2020
Brief Abstract: Understanding videos via captioning has gained a lot of traction recently.