Slide 1

Slide 1 text

A Public Service Announcement on RFC-Validation of Email Addresses @allenfair - http://girders.org http:://biglist.com philly.rb - March 12, 2013 Tuesday, March 12, 13

Slide 2

Slide 2 text

For your consideration: Opinionated Email Addresses • Registration • Attributes • Spam Detection • SMTP (Sending Email) • “Friending” / Social Expansion Tuesday, March 12, 13

Slide 3

Slide 3 text

User Parts (Good) • Basic: [email protected] • Dots: [email protected], [email protected] • Apostrophes: miles.o’[email protected] • Other: allen_fair,[email protected] • Address Tags: [email protected] - Keep Tuesday, March 12, 13

Slide 4

Slide 4 text

User Parts (Bad) • Case-Sensitive: [email protected] - Confusing! • Spaces: “Allen Fair”@biglist.com, “ “@biglist.com - Uggh! • Line Noise: !#$%&'*+-/=?^_`{}|[email protected] - No way! • Non-ASCII: résumé@company.com - Invalid RFC, SMTP • These make up about 0.01%* or less of real user addresses • If you MUST retain these cases, consider using double quotes to show exactness: “Allen”@biglist.com, “!#$”@example.com Tuesday, March 12, 13

Slide 5

Slide 5 text

RFC Email Address Regular Expression • (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!# $%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c \x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?: [a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a- z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4] [0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]| 2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a- z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a \x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+) \]) • /^\w[\w\.\-\+\']*@(\w[\w\-]*\.)+\w{2,15}$/ Tuesday, March 12, 13

Slide 6

Slide 6 text

Domain Names • Case Insensitive: [email protected] • Subdomains: [email protected], [email protected] • IP: allen@[127.0.0.1], allen@[IPv6:0:0:1] - Bad! • Unicode: allen@‘→❄→‚→‗→☺→’→☹→✝.ws - Bad • Punycode: [email protected] • Bang Notation: Bad! Tuesday, March 12, 13

Slide 7

Slide 7 text

Unique Email Accounts, Multiple Addresses • Case *should* be ignored: lower case • Gmail ‘.’ removed • Address tags should be kept, BUT ignored here • Non-essential subdomains (email, www) dropped (Make sure example.co.uk is not) • Collapse spaces, allow: \w _ - + . , ‘ • Transliterate Unicode characters (é -> e) & Punycode domains • Save as “canonical address” for searching, lookup Tuesday, March 12, 13

Slide 8

Slide 8 text

Validating Addresses • Lookup up domain MX record • Lookup domain A record (badly configured DNS) • Don’t SMTP check user records (Dictionary attack!) • Third-party address validation services • Send a confirmation Email, process undeliverable emails • Temporary/Disposable Email Address Domains (hard to identify) Tuesday, March 12, 13

Slide 9

Slide 9 text

Sane Email Addresses • User errors, confusion • Customer Support • Internationalization (account name -> email name) • Normalize (edit, sanitize, validate) incoming addresses • Store canonical (unique) addresses for lookup Tuesday, March 12, 13

Slide 10

Slide 10 text

Thank you for your consideration • http://girders.org/ • @allenfair • [email protected] Tuesday, March 12, 13