Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Information Retrieval and Text Mining 2020 - Introduction

Information Retrieval and Text Mining 2020 - Introduction

University of Stavanger, DAT640, 2020 fall

Krisztian Balog

August 24, 2020
Tweet

More Decks by Krisztian Balog

Other Decks in Education

Transcript

  1. Introduc on [DAT640] Informa on Retrieval and Text Mining Krisz

    an Balog University of Stavanger August 24, 2020
  2. About me • Professor at the University of Stavanger •

    Professor II at NTNU • Former Visiting Staff Faculty Researcher at Google (2018-2020) • More: https://krisztianbalog.com/ 2 / 16
  3. About the course • Techniques and methods for processing, mining,

    and searching in massive text collections • Information retrieval (search engines): Analysis, organization, storage, and retrieval of information ◦ Search engine architecture ◦ Retrieval models ◦ Search engine evaluation ◦ Knowledge graphs and semantic search • Text mining (text analytics): Deriving high-quality information from textual data by analyzing trends and patterns ◦ Text classification ◦ Text clustering ◦ Topic analysis 3 / 16
  4. Prerequisites/requirements • No formal prerequisites, but you are expected to

    know ◦ git ◦ Python ◦ Databases (basic concepts) ◦ A bit of statistics 4 / 16
  5. COVID-19 preven on measures • Wash your hands often •

    Keep a safe distance from others • Clean desk after yourself • Stay at home if not feeling well • See uis.no pages for details and updates 5 / 16
  6. Course organiza on - COVID-19 Edi on • Students are

    divided into Groups A and B ◦ Group A: last names starting A-L ◦ Group B: last names starting M-Z (M-Å) ◦ Group assignments can be found on Canvas • Each class/lab is given in two identical editions, for the two groups • Class attendance is logged (for COVID) • You will only be permitted to enter the classroom in your assigned timeslot 6 / 16
  7. Course organiza on - COVID-19 Edi on • Lecture period

    (weeks 35-42) ◦ Mondays and Tuesdays are classes for discussion and exercises (led by me) • Video lectures are made available (at least 24hours) before the class–You are expected to watch it before the class! ◦ Wednesdays are labs for getting help on the obligatory assignments (led by TA) • Trial exam (week 43) • Group project work (weeks 44-46) ◦ Complete a project in groups of 2-3 and write a report that will be graded • Bring your own device (laptop) ◦ Python 3.6+ (Anaconda distribution) ◦ GitHub user and git client (e.g., GitHub Desktop) 7 / 16
  8. Grading weight mark Project work 40% A-F Written exam 60%

    A-F • Project work ◦ 50% from individual assignments ◦ 50% from group project work ◦ Needs to be >F in order to pass the course! • Written exam ◦ Digital exam (Inspera) ◦ Open book ◦ Mixture of exercises, multiple choice, and essay questions 8 / 16
  9. Assignments • Usually, a new assignment is announced each week

    during the lecture period, with a deadline 2 weeks in the future (there may be exceptions) • Points vary based on difficulty • Single delivery (no resubmissions, corrections, etc.) • Deadlines are strict, no extensions, no exceptions! • Assignments account for 50% of the project work final grade • Wednesday labs are dedicated to working on assignments—this is the time and place to get help 9 / 16
  10. Assignments workflow • A private GitHub repository is created for

    you ◦ You need to fill out the sign-up form, if you haven’t already done so! • Starter files are pushed to your private repository • You need to complete the tasks—you know you’re done when your code passes all the tests ◦ Some tasks may have additional “hidden” tests for grading • You need to push and commit your changes ◦ Easiest is to check on GitHub web interface whether your latest version is submitted 10 / 16
  11. Group project work • Work in groups of 2-3 •

    There will be a pool of options to select a project from • The task will be to tackle a problem, perform experiments, and write a report about the findings • Groups can get weekly feedback during the group project period ◦ 15min dedicated weekly slots with lecture during the class hours (Mon/Tue) to discuss progress/ideas ◦ Feedback on draft report from the teaching assistant during lab hours (Wed) ◦ Both in-person and remote (Zoom) options will be available • More information will follow later • Accounts for 50% of the project work final grade 11 / 16
  12. Resources • Canvas is used only for announcements • One-stop

    shop GitHub repo: http://bit.ly/dat640uis ◦ Course schedule and curriculum ◦ Lecture slides and links to videos ◦ Example code ◦ Assignments ◦ etc. 12 / 16
  13. Textbook 1 Text Data Management and Analysis: A Practical Introduction

    to Information Retrieval and Text Mining (Zhai and Massung), ACM and Morgan & Claypool Publishers, 2016. 13 / 16
  14. Contact • For all course-related matters, the primary contact email

    is [email protected] • Wednesday labs are for working on the assignments. This is the time to get help! • If you need to talk to the lecturer, make an appointment via email. No drop-ins unannounced! 15 / 16
  15. Discussion and Exercises • Mondays and Tuesdays are dedicated to

    discussion and related exercises • See the exercise workflow on GitHub under exercises: https://github.com/kbalog/ir-course/tree/master/exercises • Make sure you have Python 3.6+ (Anaconda distribution highly recommended) and the ipython-unittest package installed • Complete today’s exercise (under exercises/20200824) 16 / 16