Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vibe Querying Keynote: From Interactive Data An...

Avatar for Arnab Nandi Arnab Nandi
June 22, 2025
53

Vibe Querying Keynote: From Interactive Data Analytics to Vibe Querying - A Decade of HILDA

In my HILDA 2025 keynote, I trace a decade-long journey of the Human-in-the-Loop Data Analytics (HILDA) community, showing how our work over the past years set the stage for today’s AI-empowered workflows. I unpack the rise of "vibe coding" -- agentic, conversational software development where LLMs act as collaborative IDE copilots -- and chart its natural transition into "vibe querying", a paradigm that lets users begin with an imprecise "data vibe" and iteratively converge, via natural interfaces into precise database queries and results.

The talk distills four pillars underpinning this future: high-level abstractions, natural interfaces, iteration & interaction, and refining from ambiguity. It surveys current techniques from the database community that already power this vision, ranging approximate query processing to predictive interaction. I then highlight three open challenges in this space: cognitive awareness ("sunglasses for data"), continuous verification loops, and multimodal data analysis. Together, these threads outline a research agenda for building human-centric data infrastructure that keeps humans in the loop as LLMs become the interaction substrate.

Avatar for Arnab Nandi

Arnab Nandi

June 22, 2025
Tweet

Transcript

  1. From Interactive Data Analytics to Vibe Querying A DECADE OF

    HILDA Arnab Nandi Professor Computer Science & Engineering
  2. The Three Community Opportunity Database Systems Human- Computer Interaction Visualization

    4 👤 Growing observation that including the human is crucial towards building data systems. What if we made them the focus? Related workshops & conversations: DSIA @ Vis, IDEAS @ KDD
  3. HILDA’S FOCUS “how data management can be done with awareness

    of the people who form part of the processes” 5
  4. First HILDA: 2016 • Thank you Alan Fekete for getting

    us together! (and Anastasia Ailamaki and Barzan Mozafari, SIGMOD16 Workshop Chairs) 6
  5. HILDA Paper Themes 7 2016 2017 2018 2019 2020 2022

    2023 2024 2025 User Studies 1 3 4 0 2 1 1 2 3 Data Visualization 4 1 2 2 2 2 1 1 0 Query Interfaces 3 2 2 0 2 1 1 2 1 Data Cleaning 2 3 1 3 0 0 0 0 0 Data Exploration 3 2 4 1 1 0 1 0 1 Data Integration 0 0 3 1 1 0 2 2 1 Systems & Infrastructure 4 2 0 1 0 0 1 0 0 Machine Learning 3 4 1 3 3 2 1 1 2 Explainability / XAI 1 3 1 3 1 4 0 2 2 Large Language Models (LLMs) 0 0 0 0 0 0 1 4 3 NL to SQL 1 1 0 0 2 0 0 2 2 Data Discovery 1 1 1 0 0 0 1 2 1 Provenance 2 1 0 0 0 1 2 1 0 Video Querying 0 0 2 0 1 0 0 1 2 Opening ML/AI Black boxes = more pubs User Interfaces Data Pipelines Performance
  6. THEMES Interactive Data Analytics • Focus on the end-user human-in-the-loop

    • User Studies and more • Data Exploration • Visualizations • Novel Query Interfaces • Systems Performance towards Interactivity • Consistent focus across all years 8 2025 2024 2023 2022 2020 2019 2018 2017 2016 3 2 1 1 2 0 4 3 1 User Studies 0 1 1 2 2 2 2 1 4 Data Visualization 1 2 1 1 2 0 2 2 3 Query Interfaces 0 0 0 0 0 3 1 3 2 Data Cleaning 1 0 1 0 1 1 4 2 3 Data Exploration 1 2 2 0 1 1 3 0 0 Data Integration 0 0 1 0 0 1 0 2 4 Systems & Infrastructure 2 1 1 2 3 3 1 4 3 Machine Learning 2 2 0 4 1 3 1 3 1 Explainability / XAI 3 4 1 0 0 0 0 0 0 Large Language Models (LLMs) 2 2 0 0 2 0 0 1 1 NL to SQL 1 2 1 0 0 0 1 1 1 Data Discovery 0 1 2 1 0 0 0 1 2 Provenance 2 1 0 0 1 0 2 0 0 Video Querying
  7. THEMES End-to-end Data Pipelines • Beyond just data consumption &

    querying • all stages of the data pipeline • Discovery, Cleaning, Integration • Real-world data is messy • Several points that need human expertise and interaction 9 2025 2024 2023 2022 2020 2019 2018 2017 2016 3 2 1 1 2 0 4 3 1 User Studies 0 1 1 2 2 2 2 1 4 Data Visualization 1 2 1 1 2 0 2 2 3 Query Interfaces 0 0 0 0 0 3 1 3 2 Data Cleaning 1 0 1 0 1 1 4 2 3 Data Exploration 1 2 2 0 1 1 3 0 0 Data Integration 0 0 1 0 0 1 0 2 4 Systems & Infrastructure 2 1 1 2 3 3 1 4 3 Machine Learning 2 2 0 4 1 3 1 3 1 Explainability / XAI 3 4 1 0 0 0 0 0 0 Large Language Models (LLMs) 2 2 0 0 2 0 0 1 1 NL to SQL 1 2 1 0 0 0 1 1 1 Data Discovery 0 1 2 1 0 0 0 1 2 Provenance 2 1 0 0 1 0 2 0 0 Video Querying
  8. THEMES Opening Black Boxes: AI/ML • Rise of AI/ML •

    AI systems to assist Data Management • Data management to assist AI systems • Human involvement in both aspects • AI/ML systems tend to be black boxes • Observability and Explainability become critical needs 10 2025 2024 2023 2022 2020 2019 2018 2017 2016 3 2 1 1 2 0 4 3 1 User Studies 0 1 1 2 2 2 2 1 4 Data Visualization 1 2 1 1 2 0 2 2 3 Query Interfaces 0 0 0 0 0 3 1 3 2 Data Cleaning 1 0 1 0 1 1 4 2 3 Data Exploration 1 2 2 0 1 1 3 0 0 Data Integration 0 0 1 0 0 1 0 2 4 Systems & Infrastructure 2 1 1 2 3 3 1 4 3 Machine Learning 2 2 0 4 1 3 1 3 1 Explainability / XAI 3 4 1 0 0 0 0 0 0 Large Language Models (LLMs) 2 2 0 0 2 0 0 1 1 NL to SQL 1 2 1 0 0 0 1 1 1 Data Discovery 0 1 2 1 0 0 0 1 2 Provenance 2 1 0 0 1 0 2 0 0 Video Querying
  9. THEMES LLMs take over • Pivotal moment across all fields,

    not just computer science • In the HILDA world • Chat interfaces making a comeback • Using LLMs to unlock new capabilities or quality, e.g., NL2SQL • Not just querying: All parts of data management, e.g., schema design, data discovery, benchmarking 11
  10. • Rapid pace of improvements • Breakthrough zero-shot abilities to

    understand unseen data • Broad enough to bridge the “open-domain vs closed domain” gap • LLMs as the new interaction substrate • New ability to offload and augment human interaction • New challenge: how to integrate 12 ARNAB NANDI | VIBE QUERYING | HILDA 2025 TRENDS LLMs take over
  11. Example: 10 minutes with Claude • Problem: Students asking if

    there are resources to study for Intro to DB class • Solution: Vibe code an interactive quiz game backed by a question bank 15 ARNAB NANDI | VIBE QUERYING | HILDA 2025
  12. Conversational Coding After a while – the right pane (chat)

    becomes the ONLY interaction, even if you know how to edit! Slightly more involved but a closed human interaction loop 16 Cursor.com
  13. Popularity • Interest in “vibe coding” • AI-assisted IDE Cursor.com

    has grown to over 1 million users in ~24 months • Not just “no-code / low- code”: enabled experts, beginners, and a new population of builders 17
  14. Why is this working so well? 18 ARNAB NANDI |

    VIBE QUERYING | HILDA 2025 • Easier to look at end-to-end and say “something’s off” • Hypothesis: For complex tasks: • gestalt-level evaluation is easier than focusing on details • Smaller pieces of the complex task are somewhat deterministic
  15. Impact on Databases 19 ARNAB NANDI | VIBE QUERYING |

    HILDA 2025 “In 2024, something shifted: AI-native apps started taking off… within a few months, over 80% of databases were being created by AI agents rather than humans.” – Nikita Shamgunov, Neon https://neon.com/blog/neon-and-databricks
  16. 20 • Natural language instructions as an abstraction layer above

    code • Voice-based interface (optional) • Even minor edits are done at high abstraction level • Accept-all • Provide errors as feedback till fixed • Ask for random changes if blocked “It’s not really coding – I just see stuff, say stuff, run stuff…” ARNAB NANDI | VIBE QUERYING | HILDA 2025
  17. Vibe Coding Nuances • Indirect Manipulation: Keep big-picture context in

    a “Plan” • Preamble: Breaks down request into actionable parts • Iterates till it is unstuck • Summarize what was done • Suggest next steps 21 ARNAB NANDI | VIBE QUERYING | HILDA 2025
  18. Pillars of Vibe Coding Natural Interface Interactive & Iterative High-level

    Abstraction Refine from (initial) ambiguity ARNAB NANDI | VIBE QUERYING | HILDA 2025
  19. Pillars of Vibe Querying THE HILDA (AND DB) COMMUNITY HAS

    BEEN WORKING TOWARDS THIS FUTURE! Coding Natural Interface Interactive & Iterative High-level Abstraction Refine from (initial) ambiguity
  20. Vibe Querying • Example: User begins with loose intent instead

    of schema-precise query • “I’m looking for senior employees” • Instinctively knows when the answer “feels right” or “feels wrong” • Strong portfolio of work in this space • Early faceted browsing & keyword-to-SQL • Query by example, query by output, grand tours, dimension discovery • Mixed-initiative UIs, query recommendation • Vision: a system that fluidly alternates between suggesting, explaining, and executing queries and their results 27 ARNAB NANDI | VIBE QUERYING | HILDA 2025
  21. Why build a Vibe Querying Stack? 28 ARNAB NANDI |

    VIBE QUERYING | HILDA 2025 • Build on (impressive) human ability to very quickly recognize if something “looks right”, even in complex settings • Democratizing data management ideas is hard • Vibe coding has so quickly unlocked a massive new population of builders • And with it, a whole new wave of systems being built
  22. Natural Language to SQL • Most popular modality for queries

    with active leaderboards • Spans both DB and NLP research, and industry • Rapidly approaching human-levels(but not yet!) 31 ARNAB NANDI | VIBE QUERYING | HILDA 2025 BIRD-SQL Leaderboard Spider 2.0 Leaderboard
  23. Non-keyboard UIs: Touch & Gestures 32 ARNAB NANDI | VIBE

    QUERYING | HILDA 2025 DBTouch Idreos & Liarou, CIDR 2013 • Touch-based UIs generate a very different set of workloads for DBs • Rethink not just frontend, but also storage and execution engine
  24. Grammars for Query Specification • Closed grammar that maps gestures

    to query operators • Interactive, Visual Feedback 33 ARNAB NANDI | VIBE QUERYING | HILDA 2025 GestureQuery, Jiang & Nandi, VLDB 2014
  25. Data-driven Query Guidance it: Firstly, it can’t fit high-dimensional datasets

    because as the number of dimensions increases, the number of tuples left will increase exponentially. Secondly, it does not take advantage of the characteristics of the dataset. Actually, the queries aligned with the contour of the cluster is where the cardinality change is great, which is shown in Figure 4. Since the number of clusters is much less than the number of bins in each dimension, it will also vastly improve the interactive performance. There are several other options for detecting the contour of the clusters: one is the edge detection method in Computer Vision and the other is the grid-based cluster- ing method, which suits this problem very well. However, both methods can only work in low-dimensional datasets. In order to handle the high-dimensional datasets, we adopt a KMeans + histogram method. Backend: In the backend, the dataset is clustered first. In order to give a more accurate approximation of the cardi- nality, for each dimension of each cluster, we divide it into a set of bins and assume in each bin the data points are uniformly distributed. The backend algorithm is shown in Algorithm 3. Algorithm 3 Data Contour Method - Backend 1: Normalization 2: Select K points as initial centroids 3: Form K clusters by assigning each point to its closest centroid 4: Remove outliers 5: Form K clusters by assigning each point to its closest centroid 6: Divide each dimension of each cluster into bins 7: Remove empty bins for each dimension of each cluster The number of clusters is determined by the gap statis- tic [18]. We use a cluster-based method [38] to remove out- liers: whether a point is an outlier or not is based on the ratio of the distance between the point and the centroid to the distance between the median point and the centroid. After outliers are removed, we re-cluster the dataset. Frontend: For a given filter range, Algorithm 4 shows how to update the frequency of the bin in the frontend. Specif- ically, for the filtered dimension, we attain the overlapping queries is represented by its left representative query. The snapping will occur if the manipulated value is aligned with the contour of the cluster. Figure 4 gives an example to illustrate how the data contour method groups the queries, selects the representative query, and where the snapping oc- curs. Assume v11 , v12 , ..., v1i , ... are the contour of clusters in one dimension, v12  v < v13 , and the users is dragging the handle h1 from v13 to v12 , all the queries issued in the process is represented by the query q r2 . For example, q n21 and q n22 are represented by the query q r2 . And once the handle h1 is moved onto v12 , it means q r2 is specified and the manipulated value is aligned with the contour. Figure 4 shows there exists a big cardinality di↵erence between q r1 and q r2 since another cluster will be involved if we move the handle h1 a little bit left, so the snapping will occur on the value v12 and always occur on the contour of the cluster. -74.02 -74.01 -74 -73.99 -73.98 -73.97 -73.96 -73.95 -73.94 start station longitude q n21 q r2 q r1 v 11 v 12 v 13 v q n22 h 1 h 1 h 2 Figure 4: Determination of Intended Queries under Data Contour Method. There are six clusters in the dataset and each cluster is bounded by a box. The boundary of the box is the data contour of the cluster. v1i is the contour of the cluster in one di- mension. q n21 and q n22 are the neighborhood queries ARNAB NANDI | VIBE QUERYING | HILDA 2025 Specifying queries with gestures can be inaccurate. Use data distribution to “Snap” to the right query. SnapToQuery, Jiang & Nandi, VLDB 2015
  26. Augmented Reality 36 ARNAB NANDI | VIBE QUERYING | HILDA

    2025 ARQuery, Burley & Nandi, CIDR 2019 Invisible UIs: Headsets, Tablets Analytics = “Hallucinations”
  27. Voice/Speech-based Querying 37 SpeakQL, Chandrana et al., HILDA 2017 CiceroDB,

    Trummer, CIDR 2019 Voice Summarization, Trummer & Anderson, ICDE 2021 • Automatic Speech Recognition is getting dramatically better • Some errors still remain • Ambiguity in spoken language • Length and complexity of vocalized results = bottleneck • Listening time and human attention • Exacerbated by LLMs generating data ARNAB NANDI | VIBE QUERYING | HILDA 2025
  28. Accelerating Data Exploration Solves chicken-vs-egg / cold start problem 39

    ARNAB NANDI | VIBE QUERYING | HILDA 2025 SEE THE DATA ASK THE QUESTION FASTER!
  29. Accelerating Exploration: Interactivity 40 ARNAB NANDI | VIBE QUERYING |

    HILDA 2025 Guided Interaction, Nandi & Jagadish, VLDB 2011 Crossfilter, Square 2012 • If response times are under a threshold, UI feels “instantaneous” • JavaScript speedup in browsers + UInt Arrays = data-intensive interfaces on the web • Accelerates getting to the right question
  30. Crossfiltering, Resolution Awareness • Linked Visualizations = materialized views •

    Asynchronous computation and view updates on interaction • Tradeoff: Interactive Response Times vs Result Resolution • Resolution awareness = rewrite to simpler, faster queries 41 ARNAB NANDI | VIBE QUERYING | HILDA 2025 Falcon, Moritz et al., 2019 M4, Jugel et al., VLDB 2014 DataSpread, Bendre et al. VDLB 2015
  31. Beyond Query Recommendations 42 • Analyze relationships in data (dependencies,

    cardinality, clusters) • Recommend aggregations and visualizations Utopia, Fariha et al. VLDB 2024 Lux, SeeDB, Parameswaram et al.
  32. Caching, prefetching, think time 43 SQ a User Queries User

    QUERY SESSION Query1 Query2 Query3 Perusing
 Results Speculative Queries RESULTS STORED IN LRU CACHE SQ b SQ c SQ d SQ e SQ f Forecache, Battle et al., SIGMOD 2016 IDEBench, Eichmann et al., SIGMOD 2020 DICE, Kamat et al. ICDE 2014
  33. Auto-generating Interfaces 45 ARNAB NANDI | VIBE QUERYING | HILDA

    2025 Precision Interfaces, Zhang, et al.SIGMOD 2018 Flux Capacitor, Khan & Nandi, IUI 2019 • Parse query workloads and derive interaction graphs • Map to interface widgets best suited for a modality • Since UIs have predictable workloads (e.g., physics), build in caching and performance layers
  34. Beyond Chat: Predictive Interaction • Lift interactions to a higher

    abstraction / grammar • Observe and predict interactions 46 ARNAB NANDI | VIBE QUERYING | HILDA 2025 “Predictive Interaction”, Heer et al., CIDR 2015
  35. Copilots and CoWranglers • Understand both data and intent •

    Generate wrangling code • Reduces need for exploratory data analysis code • Expect all query UIs to be copiloted 47 ARNAB NANDI | VIBE QUERYING | HILDA 2025 CoWrangler, Chopra, Fariha et al., SIGMOD 2023
  36. Autocompletion 49 ARNAB NANDI | VIBE QUERYING | HILDA 2025

    SEE THE DATA ASK THE QUESTION SnipSuggest, Khoussainova, VLDB 2010 Guided Interaction, Nandi & Jagadish, VLDB 2011 AUTOCOMPLETION Iterative Query Refinement at each keystroke
  37. Perception-aware analytics 50 Perceptvis, Wu & Nandi, DSIA 2015 •

    Can results be ”good enough”? • Refine after • Model human perception as functions • Optimize queries to stay within bounds
  38. Approximate Query Processing • Tradeoff: Accuracy vs Response Time •

    Online and Offline Sampling • Exploit Locality in Query Sessions 51 ARNAB NANDI | VIBE QUERYING | HILDA 2025 BlinkDB, Agrawal et al, VDLB 2014 Sesame, Kamat & Nandi, TKDD 2016
  39. We’re making excellent progress towards Vibe Querying! Natural Interface Interactive

    & Iterative High-level Abstraction Refine from (initial) ambiguity
  40. Vibe Querying Limitations • Performance impacts • read locks, compute

    • Critical use cases • Where there is no room for ambiguity or approximation • Private and Secure data • Data leakage by asking the wrong questions 53 ARNAB NANDI | VIBE QUERYING | HILDA 2025
  41. Open Challenge: Sunglasses for Data • Humans have been the

    bottleneck for a while • LLMs have exacerbated this problem • Overwhelmed by not just data, but also “thinking” results • Result summarization and personalization that is aware of human cognitive limits ARNAB NANDI | VIBE QUERYING | HILDA 2025
  42. Open Challenge: Sunglasses for Data We already had a Data

    Deluge Followed by the GenAI content and decision deluge Mistakes and misinterpretations are now even easier ARNAB NANDI | VIBE QUERYING | HILDA 2025
  43. Open Challenge: Sunglasses for Data Need: “Sunglasses for Data” Not

    just summarization and personalization, but account for cognitive limitations, your intent, and your overall task (time- and context-sensitive) All results are consumed through your data sunglasses ARNAB NANDI | VIBE QUERYING | HILDA 2025
  44. Open Challenge: Continuous Verification • LLMs are great but fail

    in inexplicable and surprising ways • Hallucinations, misunderstandings, bias, overalignment • You’re looking at a dataset, and your gut says “the vibes are off” • Humans also fail: misunderstand problem, data, or query • Automating this step: Continuous Verification loops • Build alongside interaction layer • Running continuously throughout interaction session ARNAB NANDI | VIBE QUERYING | HILDA 2025
  45. Open Challenge: Continuous Verification • Verification is a (somewhat) blocking

    interaction step • Opportunity for formal methods to verify, repair • Speed and ease will matter 59 ARNAB NANDI | VIBE QUERYING | HILDA 2025 “Software in the Era of AI” Andrej Karpathy, 2025 QR-Hint; Roy & Yang, ,SIGMOD 2024
  46. Open Challenge: Continuous Verification • Key ingredient: Well-designed grammars and

    DSLs • Human interaction occurs across a “contract” • Reduced space: Verification restricted to checking contract violations ARNAB NANDI | VIBE QUERYING | HILDA 2025
  47. Open Challenge: Multimodal Data 61 ARNAB NANDI | VIBE QUERYING

    | HILDA 2025 Drastic acceleration in zero-shot image & video understanding Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis,” arXiv 2025
  48. Multimodal Data: Video EVA: Exploratory Video Analysis, SIGMOD 2022 VIVA:

    Kang et al, CIDR 2022 ARNAB NANDI | VIBE QUERYING | HILDA 2025 • Content-wise, same level-playing field as JSON • Need new modules to better process video data
  49. End-to-end Video Pipelines 63 ARNAB NANDI | VIBE QUERYING |

    HILDA 2025 VOCAL Daum et al. V2V, Winecki & Nandi, ICDE 2024 • Cleaning, labeling, exploration, compositional querying • Efficient synthesis of video results as a single video
  50. Takeaways • A big thank you to this community: keep

    being awesome! • The vibes are great • We’re making excellent progress towards a Vibe Querying vision • The HILDA problem space is increasingly relevant • HILDA work is the pathway to unlock DBs for new populations • Open Challenges • Cognitive Limits, Continuous Verification, Multimodal Data ARNAB NANDI | VIBE QUERYING | HILDA 2025
  51. 65 Questions [email protected] Thank you to all the students, colleagues,

    collaborators, communities, and funding agencies that have been part of this work.