Fine-Grained Video Retrieval
CS Distinguished Seminar Series
September 11, 2025
11:00 AM - 12:00 PM
Fine-Grained Video Retrieval
Presenter: Mubarak Shah, University of Central Florida
Abstract: The goal of video retrieval is to learn robust representations such that a query's representation can effectively retrieve relevant items from a video gallery. While traditional methods typically return semantically related results, they often fail to ensure temporal alignment or capture fine-grained temporal nuances.
To address these limitations, Shah will first introduce Alignable Video Retrieval (AVR), a novel task that tackles the previously unexplored challenge of identifying temporally alignable videos within large datasets.
Next, he will present Composed Video Retrieval (CoVR), which focuses on retrieving a target video based on a query video and a modification text describing the desired change. Existing CoVR benchmarks largely focus on appearance variations or coarse-grained events, falling short in evaluating models’ ability to handle subtle, fast-paced temporal changes and complex compositional reasoning. To bridge this gap, he introduces two new datasets—Dense-WebVid-CoVR and TF-CoVR—which capture fine-grained and compositional actions across diverse video segments, enabling more detailed and nuanced retrieval tasks.
Shah will conclude the talk with his recent work on ViLL-E: Video LLM Embeddings for Retrieval. ViLL-E extends VideoLLMs by introducing a joint training framework that supports both generative tasks (e.g., VideoQA) and embedding-based tasks such as video retrieval. This dual capability enables VideoLLMs to generate embeddings for retrieval functionality lacking in current models—without sacrificing generative performance.
Speaker bio: Mubarak Shah is the UCF Trustee Chair Professor, and founding director of the Center for Research in Computer Visions at University of Central Florida. Shah is a fellow of ACM, IEEE, AAAS, NAI, IAPR, AAIA and SPIE. He has published extensively on topics related to human activity and action recognition, visual tracking, geo localization, visual crowd analysis, object detection and categorization, shape from shading, and other technologies.. Shah has served as an ACM and IEEE Distinguished Visitor Program speaker. He is a recipient of the 2022 PAMI Mark Everingham Prize for pioneering human action recognition datasets; 2019 ACM SIGMM Technical Achievement award; 2020 ACM SIGMM Test of Time Honorable Mention Award for his paper “Visual attention detection in video sequences using spatiotemporal cues”; 2020 International Conference on Pattern Recognition (ICPR) Best Scientific Paper Award; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation; 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010. At University of Central Flroida he's received the Pegasus Professor Award; University Distinguished Research Award; Faculty Excellence in Mentoring Doctoral Students Faculty Excellence in Mentoring Postdoctoral Scholarship of Teaching and Learning award; Teaching Incentive Program award; and Research Incentive Award.
Faculty host: Associate Professor Yan Yan
Date posted
Sep 2, 2025
Date updated
Sep 2, 2025