Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RFC-Validation of Email Addresses

RFC-Validation of Email Addresses

For your consideration: Opinionated Email Addresses. Presenting the case for not following the RFC2822 email address specification completely.

Allen Fair

March 13, 2013
Tweet

More Decks by Allen Fair

Other Decks in Programming

Transcript

  1. A Public Service Announcement on RFC-Validation of Email Addresses @allenfair

    - http://girders.org http:://biglist.com philly.rb - March 12, 2013 Tuesday, March 12, 13
  2. For your consideration: Opinionated Email Addresses • Registration • Attributes

    • Spam Detection • SMTP (Sending Email) • “Friending” / Social Expansion Tuesday, March 12, 13
  3. User Parts (Good) • Basic: [email protected] • Dots: [email protected], [email protected]

    • Apostrophes: miles.o’[email protected] • Other: allen_fair,[email protected] • Address Tags: [email protected] - Keep Tuesday, March 12, 13
  4. User Parts (Bad) • Case-Sensitive: [email protected] - Confusing! • Spaces:

    “Allen Fair”@biglist.com, “ “@biglist.com - Uggh! • Line Noise: !#$%&'*+-/=?^_`{}|[email protected] - No way! • Non-ASCII: résumé@company.com - Invalid RFC, SMTP • These make up about 0.01%* or less of real user addresses • If you MUST retain these cases, consider using double quotes to show exactness: “Allen”@biglist.com, “!#$”@example.com Tuesday, March 12, 13
  5. RFC Email Address Regular Expression • (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!# $%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c \x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01- \x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:

    [a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a- z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4] [0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]| 2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a- z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a \x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+) \]) • /^\w[\w\.\-\+\']*@(\w[\w\-]*\.)+\w{2,15}$/ Tuesday, March 12, 13
  6. Domain Names • Case Insensitive: [email protected] • Subdomains: [email protected], [email protected]

    • IP: allen@[127.0.0.1], allen@[IPv6:0:0:1] - Bad! • Unicode: allen@‘→❄→‚→‗→☺→’→☹→✝.ws - Bad • Punycode: [email protected] • Bang Notation: Bad! Tuesday, March 12, 13
  7. Unique Email Accounts, Multiple Addresses • Case *should* be ignored:

    lower case • Gmail ‘.’ removed • Address tags should be kept, BUT ignored here • Non-essential subdomains (email, www) dropped (Make sure example.co.uk is not) • Collapse spaces, allow: \w _ - + . , ‘ • Transliterate Unicode characters (é -> e) & Punycode domains • Save as “canonical address” for searching, lookup Tuesday, March 12, 13
  8. Validating Addresses • Lookup up domain MX record • Lookup

    domain A record (badly configured DNS) • Don’t SMTP check user records (Dictionary attack!) • Third-party address validation services • Send a confirmation Email, process undeliverable emails • Temporary/Disposable Email Address Domains (hard to identify) Tuesday, March 12, 13
  9. Sane Email Addresses • User errors, confusion • Customer Support

    • Internationalization (account name -> email name) • Normalize (edit, sanitize, validate) incoming addresses • Store canonical (unique) addresses for lookup Tuesday, March 12, 13