Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction - Lecture 1 - Advanced Topics in B...

Beat Signer
February 09, 2025

Introduction - Lecture 1 - Advanced Topics in Big Data (4023256FNR)

This lecture forms part of a seminar on Advanced Topics in Big Data given at the Vrije Universiteit Brussel.

Beat Signer

February 09, 2025
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005 Advanced Topics in Big Data Introduction Prof.

    Beat Signer Department of Computer Science Vrije Universiteit Brussel beatsigner.com Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - [email protected] 2

    February 11, 2025 Seminar Organisation ▪ Prof.Beat Signer Vrije Universiteit Brussel PL9.3.60 (Pleinlaan 9) +32 2 629 1239 [email protected] wise.vub.ac.be/beat-signer ▪ Prof.Bas Ketsman Vrije Universiteit Brussel F.10.741 +32 2 629 3480 [email protected] https://www.basketsman.com A picture containing person, person, necktie, wearing Description automatically generated
  3. Beat Signer - Department of Computer Science - [email protected] 3

    February 11, 2025 Seminar Organisation … ▪ Prof.Pieter Libin Vrije Universiteit Brussel PL9.3 (Pleinlaan 9) +32 2 629 2964 [email protected] ai.vub.ac.be/team/pieter-libin/ ▪ Further, there are various teaching assistants and PostDocs helping with the individual supervision
  4. Beat Signer - Department of Computer Science - [email protected] 4

    February 11, 2025 Prerequisites ▪ Students who want to enrol for this course, must have passed or be enrolled in Scalable Analytics and Information Visualisation
  5. Beat Signer - Department of Computer Science - [email protected] 5

    February 11, 2025 Course Goals ▪ In this seminar the student gets insights about recent developments in the field of Big Data systems. They will deepen their knowledge about specific topics in Big Data systems and are required to communicate the outcome to other course participants. The student should be able to critically review the assigned research papers, identify the main contributions and communicate the content in the form of a presentation as well as in a written report. ▪ The student is required to identify the contributions as well as strengths and weaknesses of a given research paper. They should further get an insight of how evaluate and position a research paper in the context of related work.
  6. Beat Signer - Department of Computer Science - [email protected] 6

    February 11, 2025 Course Goals ▪ As part of the seminar the student is required to clearly communicate about the assigned research topic. The attendee shows that they can reflect on a given research topic and discuss it with colleagues by asking and answering scientific questions.
  7. Beat Signer - Department of Computer Science - [email protected] 7

    February 11, 2025 Course Material ▪ All material will be available on Canvas ▪ lecture slides, papers, presentations, links, ... ▪ Make sure that you are subscribed to the Advanced Topics in Big Data course on Canvas ▪ https://canvas.vub.be/courses/39491
  8. Beat Signer - Department of Computer Science - [email protected] 8

    February 11, 2025 Data Management Big Data systems Main Domains of the Seminar scalable data management advanced query processing (e.g. approximate query processing) large-scale analytical database systems data integration and interoperability innovative data storage exploratory search complex data exploration and analysis multimodal information retrieval visual data discovery data mining interactive data processing data physicalisation mixed reality and TUIs cross-media information management and interaction information visualisation context-awareness and personalisation hypermedia and linked data DAMA Human-Data Interaction Data Processing and Discovery
  9. Beat Signer - Department of Computer Science - [email protected] 9

    February 11, 2025 Seminar Topics 1. Transactions ▪ Using Read Promotion and Mixed Isolation Levels for Performant Yet Serializable Execution of Transaction Programs, Brecht Vandevoort, Alan Fekete, Bas Ketsman, Frank Neven and Stijn Vansummeren, CoRR, January 2025. [https://doi.org/10.48550/arXiv.2501.18377] 2. Columnar Storage ▪ An Empirical Evaluation of Columnar Storage Formats, Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney and Huanchen Zhang, Proceedings of the VLDB Endowment 17(2), October 2023. [https://doi.org/10.14778/3626292.3626298]
  10. Beat Signer - Department of Computer Science - [email protected] 10

    February 11, 2025 Seminar Topics ... 3. OLTP ▪ OLTP Through the Looking Glass 16 Years Later: Communication is the New Bottleneck, Xinjing Zhou, Viktor Leis, Xiangyao Yu and Michael Stonebraker, Proceedings of CIDR 2025, 15th Annual Conference on Innovative Data Systems Research, Amsterdam, The Netherlands, January 2025. [https://vldb.org/cidrdb/papers/2025/p17-zhou.pdf] 4. Yannakakis Algorithm ▪ Instance-Optimal Acyclic Join Processing Without Regret: Engineering the Yannakakis Algorithm in Column Stores, Liese Bekkers, Frank Neven, Stijn Vansummeren and Yisu Remy Wang, CoRR, November 2024. [https://doi.org/10.48550/arXiv.2411.04042]
  11. Beat Signer - Department of Computer Science - [email protected] 11

    February 11, 2025 Seminar Topics ... 5. Bloom Filters ▪ Optimizing Collections of Bloom Filters within a Space Budget, Gabriel Mersy, Zhuo Wang, Stavros Sintos and Sanjay Krishnan, Proceedings of the VLDB Endowment 17(11), July 2024. [https://doi.org/10.14778/3681954.3682020] 6. Benchmarks ▪ Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet, Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf and Tim Kraska, Proceedings of the VLDB Endowment 17(11), July 2024. [https://doi.org/10.14778/3681954.3682031]
  12. Beat Signer - Department of Computer Science - [email protected] 12

    February 11, 2025 Seminar Topics ... 7. Dynamic Media ▪ MyWebstrates: Webstrates as Local-first Sofwares, Clemens Nylandsted Klokmose, James R. Eagan and Peter van Hardenberg, Proceedings of UIST 2024, 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, October 2024. [https://doi.org/10.1145/3654777.3676445] 8. Data Physicalisation ▪ That's Rough! Encoding Data into Roughness for Physicalizations, Xiaojiao Du, Kadek Ananta Satriadi, Adam Drogemuller, Brandon Matthews, Ross T. Smith, James Walsh and Andrew Cunningham, Proceedings of CHI 2024, International Conference on Human Factors in Computing Systems, Honolulu, USA, May 2024. [https://doi.org/10.1145/3613904.3641900]
  13. Beat Signer - Department of Computer Science - [email protected] 13

    February 11, 2025 Seminar Topics ... 9. Augmented Object Intelligence ▪ Augmented Object Intelligence with XR-Objects, Mustafa Doga Dogan, Eric J. Gonzalez, Karan Ahuja, Ruofei Du, Andrea Colaço, Johnny Lee, Mar Gonzalez-Franco and David Kim, Proceedings of UIST 2024, 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, October 2024. [https://doi.org/10.1145/3654777.3676379] 10.Human-AI Interaction ▪ The Metacognitive Demands and Opportunities of Generative AI, Volodymyr Mnih, Koray Kavukcuoglu, Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen and Sean Rintel, Proceedings of CHI 2024, International Conference on Human Factors in Computing Systems, Honolulu, USA, May 2024. [https://doi.org/10.1145/3613904.3642902]
  14. Beat Signer - Department of Computer Science - [email protected] 14

    February 11, 2025 Seminar Topics ... 11.Scholarly Sensemanking ▪ Paterns of Hypertext-Augmented Sensemaking, Siyi Zhu, Robert Haisfeld, Brendan Langen and Joel Chan, Proceedings of UIST 2024, 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, October 2024. [https://doi.org/10.1145/3654777.3676338] 12.Visual Recommendations ▪ Too Many Cooks: Exploring How Graphical Perception Studies Influence Visualization Recommendations in Draco, Zehua Zeng, Junran Yang, Dominik Moritz, Jeffrey Heer and Leilani Battle, IEEE Transactions on Visualization and Computer Graphics, 30, January 2024. [https://doi.org/10.1109/TVCG.2023.3326527]
  15. Beat Signer - Department of Computer Science - [email protected] 15

    February 11, 2025 Seminar Topics ... 13.AI in Bioinformatics - ESM3 ▪ Simulating 500 Million Years of Evolution with a Language Model, Hans Brombacher, Thomas Hayes et al., Science, January 2025. [https://doi.org/10.1126/science.ads0018] 14.AI in Bioinformatics - AlphaFold ▪ Highly Accurate Protein Structure Prediction with AlphaFold, John Jumper et al., Nature 596, July 2021. [https://doi.org/10.1038/s41586-021-03819-2] 15.Deepseek ▪ DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek-AI et al., CoRR, January 2025. [https://doi.org/10.48550/arXiv.2501.12948]
  16. Beat Signer - Department of Computer Science - [email protected] 16

    February 11, 2025 Seminar Topics ... 16.Epidemic Modelling ▪ A Physics-Informed Neural Network Approach for Compartmental Epidemiological Models, Caterina Millevoi, Damiano Pasetto, Massimiliano Ferronato, PLoS Computational Biology 20(9), September 2024. [https://doi.org/10.1371/journal.pcbi.1012387] 17.Reinforcement Learning - Pareto Front Identification ▪ Pareto Front Identification from Stochastic Bandit Feedback, Peter Auer, Chao-Kai Chiang, Ronald Ortner and Madalina M. Drugan, Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, May 2016. [https://proceedings.mlr.press/v51/auer16.html]
  17. Beat Signer - Department of Computer Science - [email protected] 17

    February 11, 2025 Seminar Topics ... 18.Reinforcement Learning - Thompson Sampling Analysis ▪ Analysis of Thompson Sampling for the Multi-armed Bandit Problem, Shipra Agrawal and Navin Goyal, Proceedings of the 25th Annual Conference on Learning Theory, Edinburgh, Scotland, June 2012. [https://proceedings.mlr.press/v23/agrawal12.html]
  18. Beat Signer - Department of Computer Science - [email protected] 18

    February 11, 2025 Assignment of Topics ▪ Select 6 topics/papers from the presented list and mark them (with A to F) according to your preferences ▪ Send an email with your choices (e.g. 3A, 7B, 12C, 4D, 1E, 10F) [email protected] no later than February 15 ▪ Each student will be assigned a paper that has to be presented in the seminar and the final seminar schedule will be made available by next week
  19. Beat Signer - Department of Computer Science - [email protected] 19

    February 11, 2025 Seminar Organisation ▪ Presentation should be 30 minutes long (not longer but also not shorter!) ▪ make use of the available time ▪ have some backup slides/material in case you finish too early and for the Q&A ▪ Structure of your presentation ▪ introduction of topic and problem statement (5-10 mins) ▪ proposed approach (15-20 mins) ▪ review (5 mins) - critical analysis - at least two positive and two negative points about the paper
  20. Beat Signer - Department of Computer Science - [email protected] 20

    February 11, 2025 Seminar Organisation … ▪ Send a draft of your presentation to your supervisor no later than one week before the presentation and arrange a meeting with your supervisor ▪ you will get feedback about the structure and content of your presentation ▪ Immediately after your presentation, please send us ([email protected]) your slides in order that we can make them available for your colleagues on Canvas
  21. Beat Signer - Department of Computer Science - [email protected] 21

    February 11, 2025 Seminar Organisation … ▪ Each student has to write a report about their presented paper/topic ▪ same structure as presentation - introduction of topic and problem statement - proposed approach - review ▪ no longer than 5 pages ▪ send a draft to your supervisor to get some feedback - arrange a meeting with your supervisor ▪ deadline for final report: May 20
  22. Beat Signer - Department of Computer Science - [email protected] 22

    February 11, 2025 Seminar Organisation … ▪ Each student will be assigned as a reviewer for two additional papers ▪ hand in a review via the conference system ▪ deadline: at least a week before the paper is presented ▪ Each student is assigned as a metareviewer for one paper ▪ hand in a metareview via the conference system ▪ based on the two reviews and the metareviewer's knowledge ▪ deadline: latest Sunday (midnight) before the paper is presented ▪ prepare at least two questions to open the discussion round ▪ template and example (meta)reviews are available on Canvas
  23. Beat Signer - Department of Computer Science - [email protected] 23

    February 11, 2025 Seminar Organisation … ▪ Each student has to read the papers to be presented every week before the seminar takes place and submit two questions via an online form by latest Sunday (midnight) before the lecture ▪ https://wise.vub.ac.be/atobi/
  24. Beat Signer - Department of Computer Science - [email protected] 24

    February 11, 2025 Seminar Organisation … ▪ Final grade is based on ▪ presentation (70%) ▪ written report ▪ reviews and metareview ▪ active participation in the seminar and submitted questions ▪ Everybody is expected to read the papers before the lecture takes place! ▪ after each presentation, there is enough time for questions and a discussion about the topic and content of the paper ▪ Attendance to all presentations is mandatory! ▪ Schedule will be made available on Canvas ▪ first presentations: March 11
  25. 2 December 2005 Next Week Assign Topics and Answer Questions

    Some Tips for the Presentation Conference System