Training Phase Clean the HTML DOM sub-trees CSS class co-occurrence Value Constraints CSS Selectors vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE vcard:email mailto : ALPHA @ ALPHANUMERIC . com vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER We could determine patterns for emails for example: … or even for birthdays
Clean the HTML DOM sub-trees CSS class co-occurrence Value Constraints Pattern Detection Elements Qualification Models Validation CSS Selectors Extraction Phase RDF Model From μRaptor RDF Model Test set ? = 0.94 = 0.7 = 0.8
We made the discovery of the new μRaptor species and I am very pleased some researchers helped us understanding its feeding habits Godzilla is a doll compared to μRaptor! I am currently working on a script for an upcoming movie As a kid I always wanted to see an actual dinosaur. Today my dream comes true Damn, he is better than me!