Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Guided Querying over Videos using Autocompletio...

Arnab Nandi
June 14, 2024
86

Guided Querying over Videos using Autocompletion Suggestions

A critical challenge with querying video data is that the user is often unaware of the contents of the video, its structure, and the exact terminology to use in the query. While these problems exist in exploratory querying settings over traditional structured data, these problems are exacerbated for video data, where the informa- tion is sourced from human-annotated metadata or from computer vision models running over the video. In the absence of any guidance, the human is at a loss for where to begin the query session, or how to construct the query. Here, autocompletion-based user interfaces have become a popular and pervasive approach to interactive, keystroke-level query guidance. To guide the user through the query construction process, we develop methods that combine Vision Language Models and Large Language Models for generating query suggestions that are amenable to autocompletion-based user interfaces. Through quantitative assessments over real-world datasets, we demonstrate that our approach provides a meaningful benefit to query construction for video queries.

Arnab Nandi

June 14, 2024
Tweet

Transcript

  1. Proliferation of Video Data MAINSTREAM & SOCIAL MEDIA Includes diverse

    content like television, live streams, user-generated videos, viral promotional material VEHICLES Equipped with cameras to capture video for navigation, safety, and security IoT DEVICES Collect video data for industrial analytics, retail performance, home protection, environment monitoring 2
  2. Specifying Video Search Queries: Challenges UNKNOWN PHRASING & SPELLING Users

    may not know the right terms to refer to objects in the scene, or their spelling UNKNOWN CONTENT Users may not know the contents of the video, e.g., what objects are in the scenes UNKNOWN STRUCTURE & CARDINALITY Users may not know the number of times each object occurs in the video, or how they relate to each other 4
  3. Specifying Video Search Queries UNKNOWN PHRASING & SPELLING UNKNOWN CONTENT

    UNKNOWN STRUCTURE 5 ? SEE THE DATA ASK THE QUESTION
  4. Problem Statement 6 Given a collection of videos, guide the

    user in a way that best assists in specifying precise and contextually relevant queries over video data. User Image or Video Search Query Database
  5. Video Query Suggestions 11 OUR APPROACH User’s input pedestri pedestrian

    walking to an ego vehicle pedestrian crossing a crosswalk pedestrian running at a sidewalk pedestrian pulling a trolley Database Images and Videos (NOT annotated) Suggestion Query • Provide autocompletions for retrieval of video content based on zero-shot video understanding • Goal: enhances user interactivity with the multifaceted information extracted from video data
  6. Implementation 12 Images and Videos (NOT hand-annotated) VLMs* PLLaVA The

    image shows a bustling city street scene. There are several pedestrians walking on the sidewalks, some carrying handbags and backpacks... Driving Scene Descriptions 1. Busy sidewalks with pedestrians and bags 2. Construction and urban life in a busy city 3. Everyday life in a bustling metropolis video 4. Late afternoon urban environment with shops 5. Video of a city street with several pedestrians LLMs Mistral AI Search Phrases LLMs Mistral AI 1. “Busy sidewalks”, “with pedestrians”, “and bags” 2. “Construction”, “and”, “urban life”, “in a busy city” 3. “Everyday life”, “in a bustling metropolis”, “video” 4. “Late afternoon”, “urban environment”, “with shops” 5. “Video”, “of a city street”, “with several pedestrians” Segmented Phrases VDBMS * Vision Language Models
  7. Example of Description & Phrases 13 In the current driving

    scene, we are on a multi-lane highway with a clear view of the road ahead. The sky is a bright blue, suggesting it's a sunny day. The road is divided by a central divider, and there are multiple lanes in each direction. On the left side of the road, there are several cars in motion. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. The cars are moving in the same direction as us, indicating that we are on the correct side of the road. On the right side of the road, there are more cars. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. These cars are also moving in the same direction as us. In the distance, there are buildings and trees, providing a scenic backdrop to the highway. The buildings suggest that we are near a city or urban area. Overall, the scene is a typical day on a busy highway with clear visibility and good traffic flow. Scene Description
  8. Example of Description & Phrases 14 In the current driving

    scene, we are on a multi-lane highway with a clear view of the road ahead. The sky is a bright blue, suggesting it's a sunny day. The road is divided by a central divider, and there are multiple lanes in each direction. On the left side of the road, there are several cars in motion. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. The cars are moving in the same direction as us, indicating that we are on the correct side of the road. On the right side of the road, there are more cars. The first car is a white sedan, followed by a black SUV, and then a silver hatchback. These cars are also moving in the same direction as us. In the distance, there are buildings and trees, providing a scenic backdrop to the highway. The buildings suggest that we are near a city or urban area. Overall, the scene is a typical day on a busy highway with clear visibility and good traffic flow. Scene Description 1. Sunny day multi-lane highway video with clear view and good traffic flow, 2. White sedan, black SUV, silver hatchback on busy highway 3. Bright blue sky highway scene with urban backdrop 4. Multi-lane highway with cars in each direction 5. White sedan, black SUV, silver hatchback moving in same direction 6. Busy highway scene with good visibility 7. Clear view of multi-lane highway with city buildings 8. Multiple cars on highway, including white sedans and black SUVs 9. Silver hatchback on multi-lane highway 10. White sedan and black SUV in the same lane on highway Search Phrases
  9. Example of Description & Phrases 17 In the current driving

    scene, we see a large construction vehicle, specifically a crane, driving down a narrow street. The crane is painted in a combination of orange and white, and it has a number "40" prominently displayed on its side. The street is lined with buildings on both sides, and there are various signs and awnings visible. The sky is clear and blue, suggesting it's a sunny day. The crane is moving towards the right side of the frame, and there are no other vehicles or pedestrians visible in the immediate vicinity. The overall scene suggests a typical day in a city or town where construction work might be taking place. Scene Description
  10. Example of Description & Phrases 18 In the current driving

    scene, we see a large construction vehicle, specifically a crane, driving down a narrow street. The crane is painted in a combination of orange and white, and it has a number "40" prominently displayed on its side. The street is lined with buildings on both sides, and there are various signs and awnings visible. The sky is clear and blue, suggesting it's a sunny day. The crane is moving towards the right side of the frame, and there are no other vehicles or pedestrians visible in the immediate vicinity. The overall scene suggests a typical day in a city or town where construction work might be taking place. Scene Description 1. Crane driving scene orange and white truck with number 40 2. Construction crane in narrow street orange and white with number 40 3. Large vehicle crane driving on city street orange and white number 40 4. Orange and white crane driving down street with number 40 5. Construction crane scene orange white number 40 6. Crane moving right side narrow street orange white number 40 7. Crane driving in city clear sky orange white number 40 8. Large construction crane number 40 narrow street 9. Orange and white crane driving sunny day 10. Crane number 40 driving through city street Search Phrases
  11. Example of Description & Phrases 19 In the current driving

    scene, we see a bustling city street. The main focus is a green and white bus with a white license plate, which is driving on the right side of the road. The bus is adorned with a sticker on the back that reads "TBS Hybrid", indicating it's an environmentally friendly vehicle. The bus is passing by a black car with a white license plate, which is parked on the side of the road. The car is stationary, adding to the dynamic nature of the scene. The street itself is lined with buildings, suggesting an urban setting. There are also trees visible, adding a touch of nature to the cityscape. In the background, there's a red and white sign that reads ""TBS Hybrid"", possibly indicating the location or the company associated with the bus. Overall, the scene captures a typical day in a busy city, with various vehicles and elements contributing to the urban environment. Scene Description
  12. Example of Description & Phrases 20 In the current driving

    scene, we see a bustling city street. The main focus is a green and white bus with a white license plate, which is driving on the right side of the road. The bus is adorned with a sticker on the back that reads "TBS Hybrid", indicating it's an environmentally friendly vehicle. The bus is passing by a black car with a white license plate, which is parked on the side of the road. The car is stationary, adding to the dynamic nature of the scene. The street itself is lined with buildings, suggesting an urban setting. There are also trees visible, adding a touch of nature to the cityscape. In the background, there's a red and white sign that reads ""TBS Hybrid"", possibly indicating the location or the company associated with the bus. Overall, the scene captures a typical day in a busy city, with various vehicles and elements contributing to the urban environment. Scene Description 1. Bus with "TBS Hybrid" sticker driving on city street 2. Urban scene with green and white bus passing black car 3. Environmentally-friendly bus with white license plate 4. Bus labeled "TBS Hybrid" on right side of road 5. City street scene with parked black car and moving bus 6. White license plate bus passing by stationary black car 7. Dynamic city scene with "TBS Hybrid" bus and parked car 8. Bus with "TBS Hybrid" sticker and urban backdrop 9. Bus driving past car with white license plates in city 10. Green and white bus with "TBS Hybrid" sign in busy street Search Phrases
  13. Example of Description & Phrases 21 The image shows a

    bustling city street scene. There are several pedestrians walking on the sidewalks, some carrying handbags and backpacks. The architecture suggests a modern urban environment with a mix of buildings, including what appears to be a construction site with scaffolding. The street is lined with shops and restaurants, as indicated by the signage and storefronts. The sky is overcast, and the lighting suggests it might be late afternoon or early evening. There are no vehicles in motion on the street, and the overall atmosphere is one of everyday urban life. Scene Description
  14. Example of Description & Phrases 22 The image shows a

    bustling city street scene. There are several pedestrians walking on the sidewalks, some carrying handbags and backpacks. The architecture suggests a modern urban environment with a mix of buildings, including what appears to be a construction site with scaffolding. The street is lined with shops and restaurants, as indicated by the signage and storefronts. The sky is overcast, and the lighting suggests it might be late afternoon or early evening. There are no vehicles in motion on the street, and the overall atmosphere is one of everyday urban life. Scene Description 1. Modern city street video without cars 2. Late afternoon urban hustle and bustle scene 3. Overcast evening street with construction site 4. Busy sidewalks with pedestrians and bags 5. Scaffolding and shops in a mixed-use city setting 6. Video of a modern city street scene at dusk 7. Construction and urban life in a busy city 8. Everyday life in a bustling metropolis video 9. Late afternoon urban environment with shops 10.Video of a city street with several pedestrians Search Phrases
  15. Example of Description & Phrases 23 The image shows a

    street scene with a focus on a person walking down the street. The individual appears to be a man dressed in a white shirt and dark pants, carrying a black bag. He is walking on a sidewalk that is lined with various shops and signs, suggesting a commercial or shopping area. The architecture of the buildings and the style of the signs indicate that this location is likely in Japan, possibly in a city like Tokyo. The street is lined with buildings that have signs with Japanese characters, indicating that the businesses are likely Japanese. The signs are colorful and feature various designs, which is typical for Japanese commercial districts. The street itself is paved with cobblestones, and there are no visible vehicles, which gives the scene a calm and quiet atmosphere. The lighting suggests it is daytime, and the weather appears to be clear. There are no people visible in the background, which gives the impression that the man is walking alone. The overall scene is a typical day in a Japanese urban area, with a focus on the pedestrian experience. Scene Description
  16. Example of Description & Phrases 24 The image shows a

    street scene with a focus on a person walking down the street. The individual appears to be a man dressed in a white shirt and dark pants, carrying a black bag. He is walking on a sidewalk that is lined with various shops and signs, suggesting a commercial or shopping area. The architecture of the buildings and the style of the signs indicate that this location is likely in Japan, possibly in a city like Tokyo. The street is lined with buildings that have signs with Japanese characters, indicating that the businesses are likely Japanese. The signs are colorful and feature various designs, which is typical for Japanese commercial districts. The street itself is paved with cobblestones, and there are no visible vehicles, which gives the scene a calm and quiet atmosphere. The lighting suggests it is daytime, and the weather appears to be clear. There are no people visible in the background, which gives the impression that the man is walking alone. The overall scene is a typical day in a Japanese urban area, with a focus on the pedestrian experience. Scene Description 1. Man walking in Japanese commercial area wearing white shirt and dark pants with black bag 2. Daytime street scene in Japan with cobblestone roads 3. Calm and quiet Japanese urban area with no vehicles 4. Pedestrian experience in Tokyo or similar city 5. Japanese shopping district with colorful signs 6. Man carrying bag in Japanese commercial area 7. White shirt and dark pants Japanese street scene 8. Quiet Japanese urban area with cobblestone streets 9. Japanese architectural buildings with signs in Japanese characters 10. Japanese man walking alone in commercial district Search Phrases
  17. Evaluation Metric 29 Minimal Keystrokes (MKS) Duan & Hsu [13],

    Kharitonov et al. [24] Minimum number of keystrokes required to achieve a target query • Given partial queries from the target query, • if part of target query is within suggestions, use the suggestion • if not, use next partial query Note: not the only metric, does not capture everything
  18. white van driving on a larger load Evaluation Metric: Minimal

    Keystrokes 30 Target query: white van driving on a larger load 2 2 7 Total: 11
  19. Insights • Guiding User Interaction: Unlocking chicken & egg problem;

    guiding video search queries using autocompletion • Zero-shot Annotation using VLMs: Surprisingly good quality annotations for open-domain use cases • Enhancing Autocomplete Functionality: Post-processing with LLMs enhances autocomplete without relying on search logs 32
  20. Limitations • Generation Bias: Phrases often center around a primary

    object • New significant computational resources required to power this • Users can get distracted by the completions 33
  21. Takeaways • Video search is a challenging problem • Unknown

    phrasing & spelling • Unknown contents • Unknown content structure • Query guidance system for video search using VLMs and LLMs 34