Slide 22
Slide 22 text
Crawling
● get the front page of Hackernews and Reddit
● parse html and create a list of links with votes, comments and current date
● for each new link fetch html, compress it and save it
● for each existing link update meta information
PIA - Tech Trends | Raphael Brand | Thomas Uhrig | Hannes Pernpeintner
Votes Comments
Aenean vulputate eleifend tellus. Aenean leo
ligula, porttitor eu, consequat vitae, eleifend
ac, enim. Aliquam lorem ante, dapibus in,
viverra quis, feugiat a, tellus. Phasellus
viverra nulla ut metus varius laoreet. Quisque
rutrum. Aenean imperdiet. Etiam