Workflow
• Parse HTML (HtmlCleaner)
• Convert to an org.w3c.dom.document object
• run XPath queries on this object
Slide 11
Slide 11 text
Demo
• Extract article metadata from reddit (title, submitter,
number of comments)
Slide 12
Slide 12 text
CSS Selectors
• Pick html elements we want to style!
• Some Examples:!
• Select a div with class hello:!
• div.hello
• Select a div with id hello:!
• div#hello
• XPath queries and CSS selectors are equivalent
Further Reading
• Enlive can be used for HTML templates and more
complicated scraping: !
• https://github.com/swannodette/enlive-tutorial!
• More on XPath, CSS Selectors:!
• http://www.w3.org/TR/CSS2/selector.html!
• Enlive-style selector syntax in the browser (using cljs):!
• https://github.com/prismatic/dommy/