Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction - Lecture 1 - Advanced Topics in Big Data (4023256FNR)

Introduction - Lecture 1 - Advanced Topics in Big Data (4023256FNR)

This lecture forms part of a seminar on Advanced Topics in Big Data given at the Vrije Universiteit Brussel.

Beat Signer
PRO

February 14, 2023
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005
    Advanced Topics in Big Data
    Introduction
    Prof. Beat Signer
    Department of Computer Science
    Vrije Universiteit Brussel
    beatsigner.com

    View Slide

  2. Beat Signer - Department of Computer Science - [email protected] 2
    February 14, 2023
    Seminar Organisation
    ▪ Prof. Beat Signer
    Vrije Universiteit Brussel
    PL9.3.60 (Pleinlaan 9)
    +32 2 629 1239
    [email protected]
    wise.vub.ac.be/beat-signer
    ▪ Prof. Bas Ketsman
    Vrije Universiteit Brussel
    F.10.741
    +32 2 629 3480
    [email protected]
    https://www.basketsman.com

    View Slide

  3. Beat Signer - Department of Computer Science - [email protected] 3
    February 14, 2023
    Seminar Organisation …
    ▪ Dr. Audrey Sanctorum
    Vrije Universiteit Brussel
    PL9.3.56 (Pleinlaan 9)
    +32 2 629 3749
    [email protected]
    wise.vub.ac.be/audrey-sanctorum
    ▪ Tim Baccaert
    Vrije Universiteit Brussel
    F.10.735
    [email protected]
    soft.vub.ac.be/soft/members/tbaccaer

    View Slide

  4. Beat Signer - Department of Computer Science - [email protected] 4
    February 14, 2023
    Seminar Organisation …
    ▪ Yoshi Malaise
    Vrije Universiteit Brussel
    PL9.3.58 (Pleinlaan 9)
    +32 2 629 3487
    [email protected]
    wise.vub.ac.be/yoshi-malaise
    ▪ Ingela Rossing
    Vrije Universiteit Brussel
    PL9.3.56 (Pleinlaan 9)
    +32 2 629 3749
    [email protected]
    wise.vub.ac.be/ingela-rossing

    View Slide

  5. Beat Signer - Department of Computer Science - [email protected] 5
    February 14, 2023
    Seminar Organisation …
    ▪ Xuyao Zhang
    Vrije Universiteit Brussel
    PL9.3.64 (Pleinlaan 9)
    +32 2 629 3713
    [email protected]
    wise.vub.ac.be/xuyao-zhang
    ▪ Isaac Valadez
    Vrije Universiteit Brussel
    PL9.3.56 (Pleinlaan 9)
    +32 2 629 3749
    [email protected]
    wise.vub.ac.be/isaac-valadez

    View Slide

  6. Beat Signer - Department of Computer Science - [email protected] 6
    February 14, 2023
    Seminar Organisation …
    ▪ Ekene Attoh
    Vrije Universiteit Brussel
    PL9.3.64 (Pleinlaan 9)
    +32 2 629 3713
    [email protected]
    wise.vub.ac.be/ekene-attoh
    ▪ Arun Sojan
    Vrije Universiteit Brussel
    Virtual office space
    [email protected]
    wise.vub.ac.be/arun-sojan

    View Slide

  7. Beat Signer - Department of Computer Science - [email protected] 7
    February 14, 2023
    Seminar Organisation …
    ▪ Kushal Soni
    Vrije Universiteit Brussel
    PL9.3.64 (Pleinlaan 9)
    +32 2 629 3713
    [email protected]
    wise.vub.ac.be/member/kushal-soni

    View Slide

  8. Beat Signer - Department of Computer Science - [email protected] 8
    February 14, 2023
    Prerequisites
    ▪ Students who want to enrol for this course, must
    have passed or be enrolled in Scalable Analytics
    and Information Visualisation

    View Slide

  9. Beat Signer - Department of Computer Science - bsi[email protected] 9
    February 14, 2023
    Course Goals
    ▪ In this seminar the student gets insights about recent
    developments in the field of Big Data systems. They will
    deepen their knowledge about specific topics in Big Data
    systems and are required to communicate the outcome to
    other course participants. The student should be able to
    critically review the assigned research papers, identify the
    main contributions and communicate the content in the
    form of a presentation as well as in a written report.
    ▪ The student is required to identify the contributions as well
    as strengths and weaknesses of a given research paper.
    They should further get an insight of how evaluate and
    position a research paper in the context of related work.

    View Slide

  10. Beat Signer - Department of Computer Science - [email protected] 10
    February 14, 2023
    Course Goals
    ▪ As part of the seminar the student is required to
    clearly communicate about the assigned research topic.
    The attendee shows that they can reflect on a given
    research topic and discuss it with colleagues by asking and
    answering scientific questions.

    View Slide

  11. Beat Signer - Department of Computer Science - [email protected] 11
    February 14, 2023
    Course Material
    ▪ All material will be available on Canvas
    ▪ lecture slides, papers, presentations, links, ...
    ▪ Make sure that you are subscribed to the
    Advanced Topics in Big Data course on Canvas
    ▪ https://canvas.vub.be/courses/30369

    View Slide

  12. Beat Signer - Department of Computer Science - [email protected] 12
    February 14, 2023
    Data
    Management
    Big Data systems
    Main Domains of the Seminar
    scalable data management
    advanced query processing
    (e.g. approximate query processing)
    large-scale analytical database systems
    data integration and
    interoperability innovative data storage
    exploratory search
    complex data exploration
    and analysis
    multimodal information
    retrieval
    visual data discovery
    data mining
    interactive data processing
    data physicalisation
    mixed reality and TUIs
    cross-media information
    management and interaction
    information visualisation
    context-awareness
    and personalisation
    hypermedia and linked data
    DAMA
    Human-Data
    Interaction
    Data
    Processing
    and Discovery

    View Slide

  13. Beat Signer - Department of Computer Science - [email protected] 13
    February 14, 2023
    "As We May Think" (1945) …
    ▪ Vannevar Bush's article
    'As We May Think’ (1945) is
    often seen as the "origin" of
    Information Science
    ▪ Article introduces the Memex
    ▪ memory extender
    ▪ store and access information
    ▪ follow cross-references in the form
    of associative trails between pieces
    of information (microfilms)
    ▪ prototypical hypertext machine
    ▪ trail blazers are those who find delight in
    the task of establishing useful trails
    Memex

    View Slide

  14. Beat Signer - Department of Computer Science - [email protected] 14
    February 14, 2023
    The Mother of All Demos (1968)
    ▪ Douglas Engelbart and his colleagues
    at the Stanford Research Institute
    developed the oNLine System (NLS) as
    part of the Augment Project
    ▪ vision about the future of interactive computing
    ▪ NLS was demonstrated at the Fall
    Joint Computer Conference in 1968
    ▪ showed first practical use of hypertext
    ▪ computer mouse
    ▪ remote collaboration (connected computers)
    ▪ raster-scan video monitors
    ▪ screen windows
    ▪ ...
    Douglas Engelbart

    View Slide

  15. Beat Signer - Department of Computer Science - [email protected] 15
    February 14, 2023
    Video: Apple Knowledge Navigator (1987)

    View Slide

  16. Beat Signer - Department of Computer Science - [email protected] 17
    February 14, 2023
    Video: Microsoft Productivity Vision (2015)

    View Slide

  17. Beat Signer - Department of Computer Science - [email protected] 18
    February 14, 2023
    Information
    Systems &
    Management
    Information
    Visualisation
    & Navigation
    Information
    Visualisation
    nd Navigation
    Human-Machine &
    Human-Information
    Interaction
    CISA
    RSL [4]
    MindXpres [8]
    OC2 [5]
    associative file system [2]
    data-driven storytelling [11]
    XIMA [13]
    cross-device interaction [9]
    iServer/iPaper [3]
    ViDaX [16]
    data physicalisation [32]
    OpenHPS [18]
    non-linear storytelling [27]
    PaperProof [29]
    iGesture [23]
    PaperSketch [30]
    TangHo [31]
    Context Modelling Toolkit [10]
    Midas [21]
    SpeeG2 [24]
    PaperPoint [26]
    Print-n-Link [17]
    digital libraries [12]
    Mudra [22] mixed reality [28]
    ArtVis [15]
    open cross-media linking [1]
    PimVis [14]
    DocTr [6]
    EdFest [25]
    source code visualisation [19]
    INFEX [7]
    technology-enhanced learning [20]
    Cross-Media Technologies

    View Slide

  18. Beat Signer - Department of Computer Science - [email protected] 19
    February 14, 2023
    Seminar Topics
    1. Document Management
    ▪ Passages: Interacting with Text Across Documents, Han L. Han,
    Junhang Yu, Raphael Bournet, Alexandre Ciorascu, Wendy E. Mackay
    and Michel Beaudouin-Lafon, Proceedings of CHI 2022, ACM
    Conference on Human Factors in Computing Systems, New Orleans,
    April 2022. [https://doi.org/10.1145/3491102.3502052]
    2. Data Integration
    ▪ Wikxhibit: Using HTML and Wikidata to Author Applications that Link
    Data Across the Web, Tarfah Alrashed, Lea Verou and David Karger,
    Proceedings of UIST 2022, ACM Symposium on User Interface Software
    and Technology, Bend USA, October 2022.
    [https://doi.org/10.1145/3526113.3545706]

    View Slide

  19. Beat Signer - Department of Computer Science - [email protected] 20
    February 14, 2023
    Seminar Topics ...
    3. Knowledge Management
    ▪ Reframing a Novel Decentralized Knowledge Management Concept as a
    Desirable Vision: As We May Realize the Memex, Ulrich Schmitt,
    Sustainability 2021, 13(7), April 2021.
    [https://doi.org/10.3390/su13074038]
    4. Knowledge Graphs
    ▪ FeedLens: Polymorphic Lenses for Personalizing Exploratory Search
    Over Knowledge Graphs, Harmanpreet Kaur, Doug Downey, Amanpreet
    Singh, Evie Yu-Yen Cheng, Daniel Weld and Jonathan Bragg,
    Proceedings of UIST 2022, ACM Symposium on User Interface Software
    and Technology, Bend USA, October 2022.
    [https://doi.org/10.1145/3526113.3545631]

    View Slide

  20. Beat Signer - Department of Computer Science - [email protected] 21
    February 14, 2023
    Seminar Topics ...
    5. Skills Analysis
    ▪ Towards an Automatic Approach for Assessing Program Competencies,
    Xinyuan Chang, Bingxin Wang and Bowen Hui, Proceedings of LAK
    2022, International Learning Analytics and Knowledge Conference,
    Virtual Conference, March 2022.
    [https://doi.org/10.1145/3506860.3506875]
    6. Generative Design
    ▪ BO as Assistant: Using Bayesian Optimization for Asynchronously
    Generating Design Suggestions, Yuki Koyama and Masataka Goto,
    Proceedings of UIST 2022, ACM Symposium on User Interface Software
    and Technology, Bend USA, October 2022.
    [https://doi.org/10.1145/3526113.3545664]

    View Slide

  21. Beat Signer - Department of Computer Science - [email protected] 22
    February 14, 2023
    Seminar Topics ...
    7. Augmented Reality
    ▪ MechARspace: An Authoring System Enabling Bidirectional Binding of
    Augmented Reality with Toys in Real-time, Zhengzhe Zhu, Ziyi Liu,
    Tianyi Wang, Youyou Zhang, Xun Qian, Pashin Farsak Raja, Ana
    Villanueva and Karthik Ramani, Proceedings of UIST 2022, ACM
    Symposium on User Interface Software and Technology, Bend USA,
    October 2022. [https://doi.org/10.1145/3526113.3545668]
    8. Tangible User Interfaces
    ▪ Tangible Globes for Data Visualisation in Augmented Reality, Andrew
    Crotty, Kadek Ananta Satriadi, Jim Smiley, Barrett Ens, Maxime Cordeil,
    Tobias Czauderna, Benjamin Lee, Ying Yang, Tim Dwyer and Bernhard
    Jenny, Proceedings of CHI 2022, ACM Conference on Human Factors in
    Computing Systems, New Orleans, April 2022.
    [https://doi.org/10.1145/3491102.3517715]

    View Slide

  22. Beat Signer - Department of Computer Science - [email protected] 23
    February 14, 2023
    Seminar Topics ...
    9. Haptic Interfaces
    ▪ Prolonging VR Haptic Experiences by Harvesting Kinetic Energy from the
    User, Shan-Yuan Teng, K.D. Wu, Jacqueline Chen and Pedro Lopes,
    Proceedings of UIST 2022, ACM Symposium on User Interface Software
    and Technology, Bend USA, October 2022.
    [https://doi.org/10.1145/3526113.3545635]
    10.Shape Changing Interfaces
    ▪ PITAS: Sensing and Actuating Embedded Robotic Sheet for Physical
    Information Communication, Tingyu Cheng, Jung Wook Park, Jiachen Li,
    Charles Ramey, Hongnan Lin, Gregory D. Abowd, Carolina Brum
    Medeiros, HyunJoo Oh and Marcello Giordano, Proceedings of UIST
    2022, ACM Symposium on User Interface Software and Technology,
    Bend USA, October 2022. [https://doi.org/10.1145/3491102.3517532]

    View Slide

  23. Beat Signer - Department of Computer Science - [email protected] 24
    February 14, 2023
    Seminar Topics ...
    11.Digital Twins
    ▪ Extended Reality Application Framework for a Digital-Twin-Based Smart
    Crane, Chao Yang, Xinyi Tu, Juuso Autiosalo, Riku Ala-Laurinaho, Joel
    Mattila, Pauli Salminen and Kari Tammi, Applied Sciences 2022, 12(12),
    June 2022. [https://doi.org/10.3390/app12126030]
    12.Data Visualisation
    ▪ A Design Space for Data Visualisation Transformations Between 2D and
    3D In Mixed-Reality Environments, Benjamin Lee, Maxime Cordeil,
    Arnaud Prouzeau, Bernhard Jenny and Tim Dwyer, Proceedings of CHI
    2022, ACM Conference on Human Factors in Computing Systems, New
    Orleans, April 2022. [https://doi.org/10.1145/3491102.3501859]

    View Slide

  24. Beat Signer - Department of Computer Science - [email protected] 25
    February 14, 2023
    Seminar Topics ...
    13.Data Physicalisation
    ▪ Making Data Tangible: A Cross-disciplinary Design Space for Data
    Physicalization, S. Sandra Bae, Clement Zheng, Mary Etta West, Ellen
    Yi-Luen Do, Samuel Huron and Danielle Albers Szafir, Proceedings of
    CHI 2022, ACM Conference on Human Factors in Computing Systems,
    New Orleans, April 2022. [https://doi.org/10.1145/3491102.3501939]
    14.DB Architectures - OLAP
    ▪ Building An Elastic Query Engine on Disaggregated Storage, Midhul
    Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong and Ashish
    Motivala and Thierry Cruanes, Proceedings of NSDI 2020, USENIX
    Symposium on Networked Systems Design and Implementation, Santa
    Clara, USA, February 2020.
    [https://www.usenix.org/conference/nsdi20/presentation/vuppalapati]

    View Slide

  25. Beat Signer - Department of Computer Science - [email protected] 26
    February 14, 2023
    Seminar Topics ...
    15.DB Architectures - OLTP
    ▪ Is Scalable OLTP in the Cloud a Solved Problem? Analyzing Data Access
    for Distributed OLTP Architectures, Tobias Ziegler, Philip A. Bernstein,
    Viktor Leis and Carsten Binnig, Proceedings of CIDR 2023, International
    Conference on Innovative Data Systems Research, Amsterdam, The
    Netherlands, January 2023. [https://www.cidrdb.org/cidr2023/papers/p50-
    ziegler.pdf]
    16.Joins - Multi-Core Joins
    ▪ Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited, Cagri Balkesen,
    Gustavo Alonso, Jens Teubner and M. Tamer Ozsu, Proceedings of the
    VLDB Endowment, 7(1), September 2013.
    [https://doi.org/10.14778/2732219.2732227]

    View Slide

  26. Beat Signer - Department of Computer Science - [email protected] 27
    February 14, 2023
    Seminar Topics ...
    17.Joins - Worst-Case Optimal Joins
    ▪ Adopting Worst-Case Optimal Joins in Relational Database Systems,
    Michael Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper and
    Thomas Neumann, Proceedings of the VLDB Endowment, 13(12),
    September 2020. [https://doi.org/10.14778/3407790.3407797]
    18.OLAP Indexing - Column Imprints
    ▪ Column Imprints: A Secondary Index Structure, Lefteris Sidirourgos and
    Martin Kersten, Proceedings of SIGMOD 2013, International Conference
    on Management of Data, New York, USA, June 2013.
    [https://doi.org/10.1145/2463676.2465306]

    View Slide

  27. Beat Signer - Department of Computer Science - [email protected] 28
    February 14, 2023
    Seminar Topics ...
    19.OLAP Indexing - Column Sketches
    ▪ Column Sketches: A Scan Accelerator for Rapid and Robust Predicate
    Evaluation, Brian Hentschel, Michael S. Kester, Stratos Idreos,
    Proceedings of SIGMOD 2018, International Conference on Management
    of Data, Houston, USA, June 2018.
    [https://doi.org/10.1145/3183713.3196911]
    20.OLTP Indexing - Concurrent BTrees
    ▪ Building a Bw-Tree Takes More Than Just Buzz Words, Ziqi Wang,
    Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael
    Kaminsky and David G. Andersen, Proceedings of SIGMOD 2018,
    International Conference on Management of Data, Houston, USA, June
    2018. [https://doi.org/10.1145/3183713.3196895]

    View Slide

  28. Beat Signer - Department of Computer Science - [email protected] 29
    February 14, 2023
    Seminar Topics ...
    21.OLTP Indexing - Concurrent Tries
    ▪ HOT: A Height Optimized Trie Index for Main-Memory Database Systems,
    Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht and Viktor
    Leis, Proceedings of SIGMOD 2018, International Conference on
    Management of Data, Houston, USA, June 2018.
    [https://doi.org/10.1145/3183713.3196896]
    22.Transactions - Concurrency Avoidance
    ▪ Opportunities for Optimism in Contended Main-Memory Multicore
    Transactions, Yihe Huang, William Qian, Eddie Kohler, Barbara Liskov
    and Liuba Shrira, Proceedings of the VLDB Endowment, 13(5), 2020.
    [https://doi.org/10.14778/3377369.3377373]

    View Slide

  29. Beat Signer - Department of Computer Science - [email protected] 30
    February 14, 2023
    Seminar Topics ...
    23.Transactions - Dynamic Partitioning
    ▪ Handling Highly Contended OLTP Workloads Using Fast Dynamic
    Partitioning, Guna Prasaad, Alvin Cheung and Dan Suciu, Proceedings of
    SIGMOD 2020, International Conference on Management of Data,
    Portland, USA, June 2020. [https://doi.org/10.1145/3318464.3389764]
    24.Transactions - Multi-Versioning
    ▪ Cicada: Dependably Fast Multi-Core In-Memory Transactions, Hyeontaek
    Lim, Michael Kaminsky and David G. Andersen, Proceedings of SIGMOD
    2017, International Conference on Management of Data, Chicago, USA,
    May 2017. [https://doi.org/10.1145/3035918.3064015]

    View Slide

  30. Beat Signer - Department of Computer Science - [email protected] 31
    February 14, 2023
    Seminar Topics ...
    25.Transactions - Robustness
    ▪ Robustness Against Read Committed for Transaction Templates, Brecht
    Vandevoort, Bas Ketsman, Christoph Koch and Frank Neven,
    Proceedings of the VLDB Endowment, 14(11), October 2021.
    [https://doi.org/10.14778/3476249.3476268]
    26.Transactions - Tail Latency
    ▪ Plor: General Transactions with Predictable, Low Tail Latency, Youmin
    Chen, Xiangyao Yu, Paraschos Koutris, Andrea C. Arpaci-Dusseau,
    Remzi H. Arpaci-Dusseau and Jiwu Shu, Proceedings of SIGMOD 2022,
    International Conference on Management of Data, Philadelphia, USA,
    June 2022. [https://doi.org/10.1145/3514221.3517879]

    View Slide

  31. Beat Signer - Department of Computer Science - [email protected] 32
    February 14, 2023
    Assignment of Topics
    ▪ Select 3 topics/papers from the presented list and mark
    them (with A, B and C) according to your preferences
    ▪ Send an email with your choices (e.g. 6A, 8B, 14C) to
    [email protected] no later than February 18
    ▪ Each student will be assigned a paper that has to be
    presented in the seminar and the final seminar schedule
    will be made available by next week

    View Slide

  32. Beat Signer - Department of Computer Science - [email protected] 33
    February 14, 2023
    Seminar Organisation
    ▪ Presentation should be 30 minutes long (not longer but
    also not shorter!)
    ▪ make use of the available time
    ▪ have some backup slides/material in case you finish too early and
    for the Q&A
    ▪ Structure of your presentation
    ▪ introduction of topic and problem statement (5-10 mins)
    ▪ proposed approach (15-20 mins)
    ▪ review (5 mins)
    - critical analysis
    - at least two positive and two negative points about the paper

    View Slide

  33. Beat Signer - Department of Computer Science - [email protected] 34
    February 14, 2023
    Seminar Organisation …
    ▪ Send a draft of your presentation to your supervisor no
    later than one week before the presentation and arrange
    a meeting with your supervisor
    ▪ you will get feedback about the structure and content of your
    presentation
    ▪ Immediately after your presentation, please send us
    ([email protected]) your slides in order that we can
    make them available for your colleagues on Canvas

    View Slide

  34. Beat Signer - Department of Computer Science - [email protected] 35
    February 14, 2023
    Seminar Organisation …
    ▪ Each student has to write a report about their presented
    paper/topic
    ▪ same structure as presentation
    - introduction of topic and problem statement
    - proposed approach
    - review
    ▪ no longer than 5 pages
    ▪ send a draft to your supervisor to get some feedback
    - arrange a meeting with your supervisor
    ▪ deadline for final report: May 23

    View Slide

  35. Beat Signer - Department of Computer Science - [email protected] 36
    February 14, 2023
    Seminar Organisation …
    ▪ Each student will be
    assigned as a reviewer
    for two additional papers
    ▪ hand in a review via the
    conference system
    ▪ deadline: at least a week
    before the paper is presented
    ▪ Each student is assigned as a metareviewer for one paper
    ▪ hand in a metareview via the conference system
    ▪ based on the two reviews and the metareviewer's knowledge
    ▪ deadline: latest Sunday (midnight) before the paper is presented
    ▪ prepare at least two questions to open the discussion round
    ▪ template and example (meta)reviews are available on Canvas

    View Slide

  36. Beat Signer - Department of Computer Science - [email protected] 37
    February 14, 2023
    Seminar Organisation …
    ▪ Each student has to read the papers to be presented
    every week before the seminar takes place and submit
    two questions via an online form by latest Sunday
    (midnight) before the lecture
    ▪ https://wise.vub.ac.be/atobi/

    View Slide

  37. Beat Signer - Department of Computer Science - [email protected] 38
    February 14, 2023
    Seminar Organisation …
    ▪ Final grade is based on
    ▪ presentation (70%)
    ▪ written report
    ▪ reviews and metareview
    ▪ active participation in the seminar and submitted questions
    ▪ Everybody is expected to read the papers before the
    lecture takes place!
    ▪ after each presentation, there is enough time for questions and
    a discussion about the topic and content of the paper
    ▪ Attendance to all presentations is mandatory!
    ▪ Schedule will be made available on Canvas
    ▪ first presentations: March 14

    View Slide

  38. Beat Signer - Department of Computer Science - [email protected] 39
    February 14, 2023
    References
    ▪ Vannevar Bush, As We May Think,
    Atlantic Monthly, July 1945
    ▪ https://www.theatlantic.com/doc/194507/bush/
    ▪ https://www.youtube.com/watch?v=c539cK58ees
    ▪ Apple Knowledge Navigator (1987)
    ▪ https://www.youtube.com/watch?v=umJsITGzXd0
    ▪ Microsoft Productivity Future Vision (2015)
    ▪ https://www.youtube.com/watch?v=w-tFdreZB94

    View Slide

  39. 2 December 2005
    Next Week
    Assign Topics and Answer Questions
    Some Tips for the Presentation
    Conference System

    View Slide