Guided Querying over Videos using Autocompletion Suggestions

Guided Querying over Videos using Autocompletion Suggestions Hojin Yoo Arnab
Nandi

Proliferation of Video Data MAINSTREAM & SOCIAL MEDIA Includes diverse
content like television, live streams, user-generated videos, viral promotional material VEHICLES Equipped with cameras to capture video for navigation, safety, and security IoT DEVICES Collect video data for industrial analytics, retail performance, home protection, environment monitoring 2

Searching through large volumes of video is hard. 3

Specifying Video Search Queries: Challenges UNKNOWN PHRASING & SPELLING Users
may not know the right terms to refer to objects in the scene, or their spelling UNKNOWN CONTENT Users may not know the contents of the video, e.g., what objects are in the scenes UNKNOWN STRUCTURE & CARDINALITY Users may not know the number of times each object occurs in the video, or how they relate to each other 4

Specifying Video Search Queries UNKNOWN PHRASING & SPELLING UNKNOWN CONTENT
UNKNOWN STRUCTURE 5 ? SEE THE DATA ASK THE QUESTION

Problem Statement 6 Given a collection of videos, guide the
user in a way that best assists in specifying precise and contextually relevant queries over video data. User Image or Video Search Query Database

Specifying Video Search Queries OUR APPROACH 7 ? SEE THE
DATA ASK THE QUESTION

Specifying Video Search Queries OUR APPROACH: AUTOCOMPLETION 8 SEE THE
DATA ASK THE QUESTION

Won’t you need to annotate videos? 9 BUT WAIT

LLMs to the rescue! 10 ZERO-SHOT ANNOTATION OF VIDEOS LLMs

Video Query Suggestions 11 OUR APPROACH User’s input pedestri pedestrian
walking to an ego vehicle pedestrian crossing a crosswalk pedestrian running at a sidewalk pedestrian pulling a trolley Database Images and Videos (NOT annotated) Suggestion Query • Provide autocompletions for retrieval of video content based on zero-shot video understanding • Goal: enhances user interactivity with the multifaceted information extracted from video data

Implementation 12 Images and Videos (NOT hand-annotated) VLMs* PLLaVA The
image shows a bustling city street scene. There are several pedestrians walking on the sidewalks, some carrying handbags and backpacks... Driving Scene Descriptions 1. Busy sidewalks with pedestrians and bags 2. Construction and urban life in a busy city 3. Everyday life in a bustling metropolis video 4. Late afternoon urban environment with shops 5. Video of a city street with several pedestrians LLMs Mistral AI Search Phrases LLMs Mistral AI 1. “Busy sidewalks”, “with pedestrians”, “and bags” 2. “Construction”, “and”, “urban life”, “in a busy city” 3. “Everyday life”, “in a bustling metropolis”, “video” 4. “Late afternoon”, “urban environment”, “with shops” 5. “Video”, “of a city street”, “with several pedestrians” Segmented Phrases VDBMS * Vision Language Models

Example of Description & Phrases 13 In the current driving
scene, we are on a multi-lane highway with a clear view of the road ahead. The sky is a bright blue, suggesting it's a sunny day. The road is divided by a central divider, and there are multiple lanes in each direction. On the left side of the road, there are several cars in motion. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. The cars are moving in the same direction as us, indicating that we are on the correct side of the road. On the right side of the road, there are more cars. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. These cars are also moving in the same direction as us. In the distance, there are buildings and trees, providing a scenic backdrop to the highway. The buildings suggest that we are near a city or urban area. Overall, the scene is a typical day on a busy highway with clear visibility and good traffic flow. Scene Description

scene, we are on a multi-lane highway with a clear view of the road ahead. The sky is a bright blue, suggesting it's a sunny day. The road is divided by a central divider, and there are multiple lanes in each direction. On the left side of the road, there are several cars in motion. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. The cars are moving in the same direction as us, indicating that we are on the correct side of the road. On the right side of the road, there are more cars. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. These cars are also moving in the same direction as us. In the distance, there are buildings and trees, providing a scenic backdrop to the highway. The buildings suggest that we are near a city or urban area. Overall, the scene is a typical day on a busy highway with clear visibility and good traffic flow. Scene Description 1. Sunny day multi-lane highway video with clear view and good traffic flow, 2. White sedan, black SUV, silver hatchback on busy highway 3. Bright blue sky highway scene with urban backdrop 4. Multi-lane highway with cars in each direction 5. White sedan, black SUV, silver hatchback moving in same direction 6. Busy highway scene with good visibility 7. Clear view of multi-lane highway with city buildings 8. Multiple cars on highway, including white sedans and black SUVs 9. Silver hatchback on multi-lane highway 10. White sedan and black SUV in the same lane on highway Search Phrases

Prototype Example 15

scene, we see a large construction vehicle, specifically a crane, driving down a narrow street. The crane is painted in a combination of orange and white, and it has a number "40" prominently displayed on its side. The street is lined with buildings on both sides, and there are various signs and awnings visible. The sky is clear and blue, suggesting it's a sunny day. The crane is moving towards the right side of the frame, and there are no other vehicles or pedestrians visible in the immediate vicinity. The overall scene suggests a typical day in a city or town where construction work might be taking place. Scene Description

scene, we see a large construction vehicle, specifically a crane, driving down a narrow street. The crane is painted in a combination of orange and white, and it has a number "40" prominently displayed on its side. The street is lined with buildings on both sides, and there are various signs and awnings visible. The sky is clear and blue, suggesting it's a sunny day. The crane is moving towards the right side of the frame, and there are no other vehicles or pedestrians visible in the immediate vicinity. The overall scene suggests a typical day in a city or town where construction work might be taking place. Scene Description 1. Crane driving scene orange and white truck with number 40 2. Construction crane in narrow street orange and white with number 40 3. Large vehicle crane driving on city street orange and white number 40 4. Orange and white crane driving down street with number 40 5. Construction crane scene orange white number 40 6. Crane moving right side narrow street orange white number 40 7. Crane driving in city clear sky orange white number 40 8. Large construction crane number 40 narrow street 9. Orange and white crane driving sunny day 10. Crane number 40 driving through city street Search Phrases

scene, we see a bustling city street. The main focus is a green and white bus with a white license plate, which is driving on the right side of the road. The bus is adorned with a sticker on the back that reads "TBS Hybrid", indicating it's an environmentally friendly vehicle. The bus is passing by a black car with a white license plate, which is parked on the side of the road. The car is stationary, adding to the dynamic nature of the scene. The street itself is lined with buildings, suggesting an urban setting. There are also trees visible, adding a touch of nature to the cityscape. In the background, there's a red and white sign that reads ""TBS Hybrid"", possibly indicating the location or the company associated with the bus. Overall, the scene captures a typical day in a busy city, with various vehicles and elements contributing to the urban environment. Scene Description

scene, we see a bustling city street. The main focus is a green and white bus with a white license plate, which is driving on the right side of the road. The bus is adorned with a sticker on the back that reads "TBS Hybrid", indicating it's an environmentally friendly vehicle. The bus is passing by a black car with a white license plate, which is parked on the side of the road. The car is stationary, adding to the dynamic nature of the scene. The street itself is lined with buildings, suggesting an urban setting. There are also trees visible, adding a touch of nature to the cityscape. In the background, there's a red and white sign that reads ""TBS Hybrid"", possibly indicating the location or the company associated with the bus. Overall, the scene captures a typical day in a busy city, with various vehicles and elements contributing to the urban environment. Scene Description 1. Bus with "TBS Hybrid" sticker driving on city street 2. Urban scene with green and white bus passing black car 3. Environmentally-friendly bus with white license plate 4. Bus labeled "TBS Hybrid" on right side of road 5. City street scene with parked black car and moving bus 6. White license plate bus passing by stationary black car 7. Dynamic city scene with "TBS Hybrid" bus and parked car 8. Bus with "TBS Hybrid" sticker and urban backdrop 9. Bus driving past car with white license plates in city 10. Green and white bus with "TBS Hybrid" sign in busy street Search Phrases

Example of Description & Phrases 21 The image shows a
bustling city street scene. There are several pedestrians walking on the sidewalks, some carrying handbags and backpacks. The architecture suggests a modern urban environment with a mix of buildings, including what appears to be a construction site with scaffolding. The street is lined with shops and restaurants, as indicated by the signage and storefronts. The sky is overcast, and the lighting suggests it might be late afternoon or early evening. There are no vehicles in motion on the street, and the overall atmosphere is one of everyday urban life. Scene Description

bustling city street scene. There are several pedestrians walking on the sidewalks, some carrying handbags and backpacks. The architecture suggests a modern urban environment with a mix of buildings, including what appears to be a construction site with scaffolding. The street is lined with shops and restaurants, as indicated by the signage and storefronts. The sky is overcast, and the lighting suggests it might be late afternoon or early evening. There are no vehicles in motion on the street, and the overall atmosphere is one of everyday urban life. Scene Description 1. Modern city street video without cars 2. Late afternoon urban hustle and bustle scene 3. Overcast evening street with construction site 4. Busy sidewalks with pedestrians and bags 5. Scaffolding and shops in a mixed-use city setting 6. Video of a modern city street scene at dusk 7. Construction and urban life in a busy city 8. Everyday life in a bustling metropolis video 9. Late afternoon urban environment with shops 10.Video of a city street with several pedestrians Search Phrases

street scene with a focus on a person walking down the street. The individual appears to be a man dressed in a white shirt and dark pants, carrying a black bag. He is walking on a sidewalk that is lined with various shops and signs, suggesting a commercial or shopping area. The architecture of the buildings and the style of the signs indicate that this location is likely in Japan, possibly in a city like Tokyo. The street is lined with buildings that have signs with Japanese characters, indicating that the businesses are likely Japanese. The signs are colorful and feature various designs, which is typical for Japanese commercial districts. The street itself is paved with cobblestones, and there are no visible vehicles, which gives the scene a calm and quiet atmosphere. The lighting suggests it is daytime, and the weather appears to be clear. There are no people visible in the background, which gives the impression that the man is walking alone. The overall scene is a typical day in a Japanese urban area, with a focus on the pedestrian experience. Scene Description

street scene with a focus on a person walking down the street. The individual appears to be a man dressed in a white shirt and dark pants, carrying a black bag. He is walking on a sidewalk that is lined with various shops and signs, suggesting a commercial or shopping area. The architecture of the buildings and the style of the signs indicate that this location is likely in Japan, possibly in a city like Tokyo. The street is lined with buildings that have signs with Japanese characters, indicating that the businesses are likely Japanese. The signs are colorful and feature various designs, which is typical for Japanese commercial districts. The street itself is paved with cobblestones, and there are no visible vehicles, which gives the scene a calm and quiet atmosphere. The lighting suggests it is daytime, and the weather appears to be clear. There are no people visible in the background, which gives the impression that the man is walking alone. The overall scene is a typical day in a Japanese urban area, with a focus on the pedestrian experience. Scene Description 1. Man walking in Japanese commercial area wearing white shirt and dark pants with black bag 2. Daytime street scene in Japan with cobblestone roads 3. Calm and quiet Japanese urban area with no vehicles 4. Pedestrian experience in Tokyo or similar city 5. Japanese shopping district with colorful signs 6. Man carrying bag in Japanese commercial area 7. White shirt and dark pants Japanese street scene 8. Quiet Japanese urban area with cobblestone streets 9. Japanese architectural buildings with signs in Japanese characters 10. Japanese man walking alone in commercial district Search Phrases

26 Looks good. But does it work?

Evaluation Metric 29 Minimal Keystrokes (MKS) Duan & Hsu [13],
Kharitonov et al. [24] Minimum number of keystrokes required to achieve a target query • Given partial queries from the target query, • if part of target query is within suggestions, use the suggestion • if not, use next partial query Note: not the only metric, does not capture everything

white van driving on a larger load Evaluation Metric: Minimal
Keystrokes 30 Target query: white van driving on a larger load 2 2 7 Total: 11

Initial Results: Number of Suggestions 31 10 fewer 40%

Insights • Guiding User Interaction: Unlocking chicken & egg problem;
guiding video search queries using autocompletion • Zero-shot Annotation using VLMs: Surprisingly good quality annotations for open-domain use cases • Enhancing Autocomplete Functionality: Post-processing with LLMs enhances autocomplete without relying on search logs 32

Limitations • Generation Bias: Phrases often center around a primary
object • New significant computational resources required to power this • Users can get distracted by the completions 33

Takeaways • Video search is a challenging problem • Unknown
phrasing & spelling • Unknown contents • Unknown content structure • Query guidance system for video search using VLMs and LLMs 34

35 Thank You Demo Video: https://go.osu.edu/hilda2024demo

Guided Querying over Videos using Autocompletio...

Guided Querying over Videos using Autocompletion Suggestions

Arnab Nandi

More Decks by Arnab Nandi

Featured

Transcript