Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modelling universal values

Modelling universal values

Why domain modelling doesn’t stop at the attribute level

Modelling tends to focus on what’s specific to your domain, which means starting from the beginning, all the way down to the level of individual number and text values. But data such as telephone numbers and people’s names have been around for longer than computers, and don’t need modelling all over again. Reinventing models for these universal values leads to weak models that don’t accommodate real-world complexity, and fail to benefit from international standards.

Everyone involved in detailed software design needs to know what telephone numbers, house numbers and aircraft tail numbers have in common. Attendees will discover different kinds of numbers, learn about validating email addresses and bank account numbers, and realise how unoriginal some of their bugs are. And more important than bugs that are easy to fix, we’ll see why modelling with familiar data can lead to software that fails to be inclusive.

Peter Hilton

June 23, 2022
Tweet

More Decks by Peter Hilton

Other Decks in Technology

Transcript

  1. Don’t forget the attribute values Data modelling often focuses on

    attribute lists and relationships within a bounded context You also need to model single values, such as: 1. customer name 2. address 3. gender 4. bank account number 3 @PeterHilton •
  2. Model universal values in your domain Your domain model includes

    universal values that don’t have application-specific definitions and meanings for example → 1. names and other identifiers 2. (parts of ) addresses 3. genders 4. numbers 4 @PeterHilton •
  3. @PeterHilton • Annemiek van Vleuten 6 🚴 Theo Stikkelman /

    CC BY 2.0 given name last name 
 ↓ ↓ 
 ↑ 
 tussenvoegsel
  4. Falsehoods Programmers Believe About Names, Patrick McKenzie (@patio11) 7 @PeterHilton

    • 24. My system will never have to deal with names from China. 25. Or Japan. 26. Or Korea. 27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use. https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
  5. Naming conventions depend on the country: first name + last

    name is basically racist https://hilton.org.uk/blog/respect-personal-names 8 @PeterHilton •
  6. @PeterHilton • Your customers take this personally For more examples

    of domain model name acceptance failures, follow @yournameisvalid on Twitter 9
  7. Guidelines 10 @PeterHilton • W3C offers examples and guidance for

    modelling → 1. Allow any character 
 → letters, spaces, punctuation, digits, , etc 2. Allow long names 3. Don’t parse or split names (first name + last name) 4. Collect variants for different purposes 
 → Sort by, What should we call you? https://www.w3.org/International/questions/qa-personal-names
  8. Standard text identifiers a.k.a. codes ISO 639-1 language codes (lower-case!):

    → en nl fr ISO 3166-1 alpha-2 country codes: → GB NL FR ISO 4127 currency codes (country code plus one letter): → GBP NLG FRF EUR 11 @PeterHilton •
  9. Standard names and their translations 12 @PeterHilton • English Dutch

    French Ukranian Ukrainian Oekraïens ukrainien українська Cyrillic Cyrillisch cyrillique кирилиця Ukraine Oekraïne Ukraine Україна January januari janvier січня Monday maandag lundi понеділок meters meter mètres метри
  10. Unicode Common Locale Data Repository (CLDR) 14 @PeterHilton • Unicode

    CLDR provides standard lists and translations of: territories (including countries), currencies, time zones, languages, calendar names (quarters, months & weekdays), scripts (writing systems), units of measurement, etc. ⚠ You have to use validity data to filter the lists 📄 Unicode CLDR publishes XML and JSON on GitHub https://hilton.org.uk/blog/l10n-cldr-names
  11. Overthinking Modelling text values 1. Natural language 
 → phrases

    and prose in English, French, Dutch, etc. 2. Identifiers - structured, unique 
 → country codes, HTTP header names 
 3. Formal language - parsed, executable (or renderable) 
 → Java, XML, regular expressions 4. Data - parsed 
 → CSV, JSON, base 64 encoded PNG image 15 @PeterHilton •
  12. Countries By country, you probably mean sovereign state ISO 3166-1

    alpha-2 code, Unicode CLDR localised name The closest to a standard are the United Nations members, but 18 states are not universally recognised: 🇮🇱 🇰🇷 🇰🇵 🇨🇳 🇨🇾 🇦🇲 🇵🇸 🇹🇼 🇪🇭 🏳 🏳 🇽🇰 🏳 🏳 https://hilton.org.uk/blog/country-lists 18 @PeterHilton •
  13. Post codes 117 of 190 Universal Postal Union countries use

    post codes Most countries only use (3-10) digits Some countries use letters as well Only 🇮🇪 Ireland has a unique post code per address 🚀 Yet another country-specific value model 😅 Ideally, you have each country’s up-to-date actual list https://en.wikipedia.org/wiki/Postal_code 20 @PeterHilton •
  14. Email addresses There’s a standard for email addresses. So what’s

    the problem? 1. Several updated/replaced standards: 
 RFC 822 → RFC 2822 → RFC 5322 → RFC 6854 2. Four levels of email address validation: 
 RFC format + domain + mailbox exists + correct person 3. Security risks of supporting the whole standard https://hilton.org.uk/blog/mail-address-validation 22 @PeterHilton •
  15. (((?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E- \x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?: [ \t]*\r\n)?[

    \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\ [\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0-9! #-'*+\-/=?^-`{-~.\[]]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01- \x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)? \))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E- \x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?: [ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01- \x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)? \))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E- \x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?: [ \t]*\r\n)?[ \t]+)))?"(?>(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\ \[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[\t]+)?"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?: (?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E- \x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)? [\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)? [ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))(?:(?:(?:[ \t]*\r\n)?[ \t]+)(?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01- \x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?: [ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E- \x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0-9!#-'*+\-/=?^-`{-~.\[]]+ (?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\ [\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)? [ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01- \x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\
  16. The gender trap There is more than one thing wrong

    with this form field! 25 @PeterHilton •
  17. Article 5 (General Data Protection Regulation) Principles relating to processing

    of personal data 1. Personal data shall be: (c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’) https://eur-lex.europa.eu/eli/reg/2016/679/oj#d1e1807-1-1
  18. Article 9 (General Data Protection Regulation) Processing of special categories

    of personal data 1. Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of […] data concerning a natural person’s sex life or sexual orientation shall be prohibited. https://eur-lex.europa.eu/eli/reg/2016/679/oj#d1e2051-1-1
  19. Build unisex software 29 @PeterHilton • 1. Don’t ask people

    their gender 
 (or require gendered personal titles) 2. Learn about the GDPR restrictions on personal data, 
 and data minimisation 3. Don’t limit input to two options if you do need to know https://hilton.org.uk/blog/build-unisex-software https://hilton.org.uk/blog/refactor-boolean-enumeration
  20. ☎ Telephone numbers Telephone numbers 
 only use digits… 


    except for the punctuation. And they’re numbers… except for the significant leading zeroes. Different formats for the same telephone number: (010)-790 0185 0031 10 790 01 85 +31107900185 tel:+31107900185 32 @PeterHilton •
  21. Telephone number standards E.123 Notation for national and international… (2001):

    ‘9.1 Grouping of digits in a telephone number should be accomplished by means or spaces’ RFC 3966 The tel URI for Telephone Numbers (2004): ‘even though ITU-T E.123 recommends the use of space characters as visual separators […] “tel" URIs MUST NOT use spaces in visual separators to avoid excessive escaping’ https://hilton.org.uk/blog/telephone-number-formats 33 @PeterHilton •
  22. International Bank Account Number (IBAN) ISO 13616:1997 😀 15-32 letters

    (A-Z) and digits 😀 Starts with an ISO 3166-1 alpha-2 country code 😀 Two check digits prevent the most common errors 😀 Includes bank code and account number 😀 Used in Europe, North Africa, Middle East, Caribbean… https://en.wikipedia.org/wiki/International_Bank_Account_Number 36 @PeterHilton •
  23. Other standard numbers 📗 International Standard Book Number (ISBN) 🎼

    ISO 10957 International Standard Music Number (ISMN) 🍷 International Standard Wine Number (ISWN) 🎁 International/European Article Number (EAN) 38 @PeterHilton •
  24. Overthinking Modelling numeric values 1. Cardinal numbers (nl getal, fr

    nombre, de Zahl) 
 → quantities (multitudes), e.g. size of a set: 1, 2, 3, … 
 → quantities (magnitudes), e.g. 42.38 metres 2. Ordinal numbers 
 → positions/rankings: 1st, 2nd, 3rd, … 3. Nominal numbers (nl nummer, fr numéro, de Nummer) 
 → house/tail/account numbers, and other identifiers 39 @PeterHilton •
  25. Any attribute with the word number in its name must

    be modelled as text https://hilton.org.uk/blog/non-numeric-numbers 40 @PeterHilton •
  26. Value type modelling approaches 1. Stringly-typed → command line, HTTP,

    XML 
 (everything is text) 2. Primitive types → JSON, SQL 
 (number, date, etc) 3. Constrained values → XML Schema, JSON Schema 
 (min/max, regex, etc) 4. Published standards → ISO standards 42 @PeterHilton •
  27. Guidelines 1. Don’t try to standardise or validate personal names

    2. Use ISO codes for languages, countries and currencies 3. Use Unicode CLDR for localised names, for ISO codes 4. Choose country lists carefully 5. Build unisex software - remove gender from your models 6. Validate email addresses in multiple steps, not just regex 7. Model identifiers called ‘numbers’ as text 43 @PeterHilton •
  28. Summary 1. There’s more to modelling than entity relationships 2.

    There’s more to value types than JSON types 3. Many universal values remain messy and standards-free 4. Model using enumerations of standardised lists, if you can 5. Some standards are more useful than others 44 @PeterHilton •
  29. Summary (the important parts) 6. Model universal values according to

    their purpose 
 → different versions of personal names in each context 
 → gender based on legal requirements or not at all 
 7. Model universal values according to your domain 
 → gender in retail vs passport application vs dating app 45 @PeterHilton •