Slide 1

Slide 1 text

@PeterHilton http://hilton.org.uk/ Modelling universal values http://hilton.org.uk/tag/ddd

Slide 2

Slide 2 text

What is a customer? What is a customer number? Dung Anh

Slide 3

Slide 3 text

Don’t forget the attribute values Data modelling often focuses on attribute lists and relationships within a bounded context You also need to model single values, such as: 1. customer name 2. address 3. gender 4. bank account number 3 @PeterHilton •

Slide 4

Slide 4 text

Model universal values in your domain Your domain model includes universal values that don’t have application-specific definitions and meanings for example → 1. names and other identifiers 2. (parts of ) addresses 3. genders 4. numbers 4 @PeterHilton •

Slide 5

Slide 5 text

Names & other identifiers

Slide 6

Slide 6 text

@PeterHilton • Annemiek van Vleuten 6 🚴 Theo Stikkelman / CC BY 2.0 given name last name 
 ↓ ↓ 
 ↑ 
 tussenvoegsel

Slide 7

Slide 7 text

Falsehoods Programmers Believe About Names, Patrick McKenzie (@patio11) 7 @PeterHilton • 24. My system will never have to deal with names from China. 25. Or Japan. 26. Or Korea. 27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use. https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Slide 8

Slide 8 text

Naming conventions depend on the country: first name + last name is basically racist https://hilton.org.uk/blog/respect-personal-names 8 @PeterHilton •

Slide 9

Slide 9 text

@PeterHilton • Your customers take this personally For more examples of domain model name acceptance failures, follow @yournameisvalid on Twitter 9

Slide 10

Slide 10 text

Guidelines 10 @PeterHilton • W3C offers examples and guidance for modelling → 1. Allow any character 
 → letters, spaces, punctuation, digits, , etc 2. Allow long names 3. Don’t parse or split names (first name + last name) 4. Collect variants for different purposes 
 → Sort by, What should we call you? https://www.w3.org/International/questions/qa-personal-names

Slide 11

Slide 11 text

Standard text identifiers a.k.a. codes ISO 639-1 language codes (lower-case!): → en nl fr ISO 3166-1 alpha-2 country codes: → GB NL FR ISO 4127 currency codes (country code plus one letter): → GBP NLG FRF EUR 11 @PeterHilton •

Slide 12

Slide 12 text

Standard names and their translations 12 @PeterHilton • English Dutch French Ukranian Ukrainian Oekraïens ukrainien українська Cyrillic Cyrillisch cyrillique кирилиця Ukraine Oekraïne Ukraine Україна January januari janvier січня Monday maandag lundi понеділок meters meter mètres метри

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Unicode Common Locale Data Repository (CLDR) 14 @PeterHilton • Unicode CLDR provides standard lists and translations of: territories (including countries), currencies, time zones, languages, calendar names (quarters, months & weekdays), scripts (writing systems), units of measurement, etc. ⚠ You have to use validity data to filter the lists 📄 Unicode CLDR publishes XML and JSON on GitHub https://hilton.org.uk/blog/l10n-cldr-names

Slide 15

Slide 15 text

Overthinking Modelling text values 1. Natural language 
 → phrases and prose in English, French, Dutch, etc. 2. Identifiers - structured, unique 
 → country codes, HTTP header names 
 3. Formal language - parsed, executable (or renderable) 
 → Java, XML, regular expressions 4. Data - parsed 
 → CSV, JSON, base 64 encoded PNG image 15 @PeterHilton •

Slide 16

Slide 16 text

Addresses

Slide 17

Slide 17 text

Countries Visible Each, NASA

Slide 18

Slide 18 text

Countries By country, you probably mean sovereign state ISO 3166-1 alpha-2 code, Unicode CLDR localised name The closest to a standard are the United Nations members, but 18 states are not universally recognised: 🇮🇱 🇰🇷 🇰🇵 🇨🇳 🇨🇾 🇦🇲 🇵🇸 🇹🇼 🇪🇭 🏳 🏳 🇽🇰 🏳 🏳 https://hilton.org.uk/blog/country-lists 18 @PeterHilton •

Slide 19

Slide 19 text

Stadscykel / CC0

Slide 20

Slide 20 text

Post codes 117 of 190 Universal Postal Union countries use post codes Most countries only use (3-10) digits Some countries use letters as well Only 🇮🇪 Ireland has a unique post code per address 🚀 Yet another country-specific value model 😅 Ideally, you have each country’s up-to-date actual list https://en.wikipedia.org/wiki/Postal_code 20 @PeterHilton •

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Email addresses There’s a standard for email addresses. So what’s the problem? 1. Several updated/replaced standards: 
 RFC 822 → RFC 2822 → RFC 5322 → RFC 6854 2. Four levels of email address validation: 
 RFC format + domain + mailbox exists + correct person 3. Security risks of supporting the whole standard https://hilton.org.uk/blog/mail-address-validation 22 @PeterHilton •

Slide 23

Slide 23 text

(((?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E- \x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?: [ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\ [\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0-9! #-'*+\-/=?^-`{-~.\[]]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01- \x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)? \))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E- \x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?: [ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01- \x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)? \))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E- \x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?: [ \t]*\r\n)?[ \t]+)))?"(?>(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\ \[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[\t]+)?"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?: (?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E- \x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)? [\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)? [ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))(?:(?:(?:[ \t]*\r\n)?[ \t]+)(?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01- \x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?: [ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E- \x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0-9!#-'*+\-/=?^-`{-~.\[]]+ (?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\ [\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)? [ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01- \x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\

Slide 24

Slide 24 text

Genders

Slide 25

Slide 25 text

The gender trap There is more than one thing wrong with this form field! 25 @PeterHilton •

Slide 26

Slide 26 text

Photo @QuietMisdreavus, design by telegraham

Slide 27

Slide 27 text

Article 5 (General Data Protection Regulation) Principles relating to processing of personal data 1. Personal data shall be: (c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’) https://eur-lex.europa.eu/eli/reg/2016/679/oj#d1e1807-1-1

Slide 28

Slide 28 text

Article 9 (General Data Protection Regulation) Processing of special categories of personal data 1. Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of […] data concerning a natural person’s sex life or sexual orientation shall be prohibited. https://eur-lex.europa.eu/eli/reg/2016/679/oj#d1e2051-1-1

Slide 29

Slide 29 text

Build unisex software 29 @PeterHilton • 1. Don’t ask people their gender 
 (or require gendered personal titles) 2. Learn about the GDPR restrictions on personal data, 
 and data minimisation 3. Don’t limit input to two options if you do need to know https://hilton.org.uk/blog/build-unisex-software https://hilton.org.uk/blog/refactor-boolean-enumeration

Slide 30

Slide 30 text

Numbers

Slide 31

Slide 31 text

House numbers Peter Hilton

Slide 32

Slide 32 text

☎ Telephone numbers Telephone numbers 
 only use digits… 
 except for the punctuation. And they’re numbers… except for the significant leading zeroes. Different formats for the same telephone number: (010)-790 0185 0031 10 790 01 85 +31107900185 tel:+31107900185 32 @PeterHilton •

Slide 33

Slide 33 text

Telephone number standards E.123 Notation for national and international… (2001): ‘9.1 Grouping of digits in a telephone number should be accomplished by means or spaces’ RFC 3966 The tel URI for Telephone Numbers (2004): ‘even though ITU-T E.123 recommends the use of space characters as visual separators […] “tel" URIs MUST NOT use spaces in visual separators to avoid excessive escaping’ https://hilton.org.uk/blog/telephone-number-formats 33 @PeterHilton •

Slide 34

Slide 34 text

Aircraft tail numbers CardMapr

Slide 35

Slide 35 text

Aircraft tail numbers → PH-BXD Not an ISO 3166-1 country code 😢 Not even a number 😭

Slide 36

Slide 36 text

International Bank Account Number (IBAN) ISO 13616:1997 😀 15-32 letters (A-Z) and digits 😀 Starts with an ISO 3166-1 alpha-2 country code 😀 Two check digits prevent the most common errors 😀 Includes bank code and account number 😀 Used in Europe, North Africa, Middle East, Caribbean… https://en.wikipedia.org/wiki/International_Bank_Account_Number 36 @PeterHilton •

Slide 37

Slide 37 text

😢 Not used in North America, Asia, Australasia, etc…

Slide 38

Slide 38 text

Other standard numbers 📗 International Standard Book Number (ISBN) 🎼 ISO 10957 International Standard Music Number (ISMN) 🍷 International Standard Wine Number (ISWN) 🎁 International/European Article Number (EAN) 38 @PeterHilton •

Slide 39

Slide 39 text

Overthinking Modelling numeric values 1. Cardinal numbers (nl getal, fr nombre, de Zahl) 
 → quantities (multitudes), e.g. size of a set: 1, 2, 3, … 
 → quantities (magnitudes), e.g. 42.38 metres 2. Ordinal numbers 
 → positions/rankings: 1st, 2nd, 3rd, … 3. Nominal numbers (nl nummer, fr numéro, de Nummer) 
 → house/tail/account numbers, and other identifiers 39 @PeterHilton •

Slide 40

Slide 40 text

Any attribute with the word number in its name must be modelled as text https://hilton.org.uk/blog/non-numeric-numbers 40 @PeterHilton •

Slide 41

Slide 41 text

Summary

Slide 42

Slide 42 text

Value type modelling approaches 1. Stringly-typed → command line, HTTP, XML 
 (everything is text) 2. Primitive types → JSON, SQL 
 (number, date, etc) 3. Constrained values → XML Schema, JSON Schema 
 (min/max, regex, etc) 4. Published standards → ISO standards 42 @PeterHilton •

Slide 43

Slide 43 text

Guidelines 1. Don’t try to standardise or validate personal names 2. Use ISO codes for languages, countries and currencies 3. Use Unicode CLDR for localised names, for ISO codes 4. Choose country lists carefully 5. Build unisex software - remove gender from your models 6. Validate email addresses in multiple steps, not just regex 7. Model identifiers called ‘numbers’ as text 43 @PeterHilton •

Slide 44

Slide 44 text

Summary 1. There’s more to modelling than entity relationships 2. There’s more to value types than JSON types 3. Many universal values remain messy and standards-free 4. Model using enumerations of standardised lists, if you can 5. Some standards are more useful than others 44 @PeterHilton •

Slide 45

Slide 45 text

Summary (the important parts) 6. Model universal values according to their purpose 
 → different versions of personal names in each context 
 → gender based on legal requirements or not at all 
 7. Model universal values according to your domain 
 → gender in retail vs passport application vs dating app 45 @PeterHilton •

Slide 46

Slide 46 text

https://hilton.org.uk/tag/ddd https://hilton.org.uk/blog/respect-personal-names https://hilton.org.uk/blog/l10n-cldr-names https://hilton.org.uk/blog/country-lists https://hilton.org.uk/blog/mail-address-validation https://hilton.org.uk/blog/build-unisex-software https://hilton.org.uk/blog/refactor-boolean-enumeration https://hilton.org.uk/blog/telephone-number-formats https://hilton.org.uk/blog/non-numeric-numbers

Slide 47

Slide 47 text

@PeterHilton http://hilton.org.uk/ http://hilton.org.uk/tag/ddd