Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dallas PHP - April 2016 - Database Theory, Mode...

Dallas PHP - April 2016 - Database Theory, Models and Abstractions

Proper database theory and design seems to be an increasingly lost art. As abstraction layers, migration tools and ORM become more prevalent, less focus is being spent on properly architecting the database, relationships, indexes and constraints. As a result, many projects suffer a crisis of data or scalability as data sets grow.

In this talk we'll cover: Table relationships and relationship tables Database Normalization with a focus on Third Normal Form (3NF) Database Indexes and Use Cases Database Keys and Composite Keys Database Optimization. The difference between schema and schema-less databases, and finding the proper database to suit your requirements.

E.T.Cook

April 12, 2016
Tweet

More Decks by E.T.Cook

Other Decks in Technology

Transcript

  1. A BIT ABOUT ME Transactional and Start Up Attorney MBA

    - IT and 
 Operations Management Polymathic Technologist JD - Rule of Law in
 Developing Nations Shameless Plugs
  2. TWO PRIMARY TRENDS IN DB DESIGN SQL/SCHEMA NOSQL (SCHEMALESS)* MySQL

    MongoDB MariaDB RethinkDB PostgreSQL Redis Google Big Table * THERE ARE EXCEPTIONS TO MANY OF THE CHARACTERISTICS OF TYPICAL SCHEMALESS STORES
  3. DATA MODELING - STRUCTURE PROJECT SCOPE YOU’VE BEEN ASKED BY

    HEAD MISTRESS GERTRUDE TO CREATE A SYSTEM FOR HER NEW BOARDING SCHOOL. SHE WANTS TO BE ABLE TO TRACK CLASSES, THEIR STUDENTS AND ALSO KNOW THEIR LOCATION WITHIN THE SCHOOL!
  4. MONOLITHIC TABLE COURSEID COURSE HALL MONITOR STUDENT1 STUDENT2 STUDENT3 101

    LAW NORTH SAM ERICA 102 SCIENCE WEST JOSEPH JOE JAKE ERICA 103 GEO SOUTH WILMA JANE MAX 104 COMP SOUTH WILMA CHUCK
  5. DATABASE MODEL DATABASE MODEL "A DATABASE MODEL IS A TYPE

    OF DATA MODEL THAT DETERMINES THE LOGICAL STRUCTURE OF A DATABASE AND FUNDAMENTALLY DETERMINES IN WHICH MANNER DATA CAN BE STORED, ORGANIZED, AND MANIPULATED.” WIKIPEDIA
  6. CONCEPT • Identifying the distinct entities that comprise your data’s

    architecture • Conceptualizing your data objects • Identifying the relationships between the data objects DATABASE MODELING DESIGN • Conversion of the data objects into an actual schema • Naming convention of the data and relationships • Deciding what constraints you should have
  7. MONOLITHIC TABLE COURSEID COURSE HALL MONITOR STUDENT1 STUDENT2 STUDENT3 101

    LAW NORTH SAM ERICA 102 SCIENCE WEST JOSEPH JOE JAKE ERICA 103 GEO SOUTH WILMA JANE MAX 104 COMP SOUTH WILMA CHUCK NORMALIZATION LOGICALLY ORGANIZING MODEL INTO TABLES AND COLUMNS CARDINALITY THE FUNDAMENTAL RELATION OF ONE TABLE TO ANOTHER
  8. WHY NORMALIZE? MINIMIZE DUPLICATION MODIFICATION ANOMALIES (FIDELITY) SIMPLIFY QUERIES COURSEID

    COURSE HALL STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK
  9. MODIFICATION ANOM. INSERT ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3

    101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK ? EAST ?
  10. UPDATE ANOMALY UPDATE ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3

    101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK
  11. FIDELITY ANOMALY FIDELITY ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3

    101 LAW NORTH ERIKA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK
  12. DELETE ANOMALY DELETE ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3

    101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK
  13. NORMALIZATION COURSEID COURSE HALL 101 LAW NORTH 102 SCIENCE WEST

    103 GEO SOUTH 104 COMP SOUTH FIRST NORMAL FORM - 1NF STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK PRIMARY KEY ONLY ATOMIC VALUES NO REPEATED COLUMNS
  14. NORMALIZATION COURSEID COURSE HALL 101 LAW NORTH 102 SCIENCE WEST

    103 GEO SOUTH 104 COMP SOUTH 1NF - WHAT DID WE GAIN? STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK UNLIMITED STUDENTS SIMPLER QUERY OF STUDENT FIXED STUDENT ANOMALIES
  15. NORMALIZATION COURSEID COURSE 101 LAW 102 SCIENCE 103 GEO 104

    COMP SECOND NORMAL FORM - 2NF STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK TABLE MUST BE 1NF ALL NON-KEY COLUMNS (ATTRIBUTES) ARE PK DEPENDENT HALLID HALL N NORTH W WEST S SOUTH E EAST
  16. NORMALIZATION 2NF - WHAT DID WE GAIN? ANOMALIES HAVE BEEN

    RESOLVED COURSEID COURSE 101 LAW 102 SCIENCE 103 GEO 104 COMP STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK HALLID HALL N NORTH W WEST S SOUTH E EAST
  17. NORMALIZATION WE NEED MORE DATA STUDENTID STUDENT CITY ZIP S1

    ERICA ADDISON 75001 S2 JAKE JACKSON 25343 S3 MAX HARTLETT 53335 S4 JOE ADDISON 75001 S5 JANE WAYNE 12345 S6 CHUCK DALLAS 75243
  18. NORMALIZATION THIRD NORMAL FORM - 3NF STUDENTID STUDENT ZIP S1

    ERICA 75001 S2 JAKE 25343 S3 MAX 53335 S4 JOE 75001 S5 JANE 12345 S6 CHUCK 75243 TABLE IS IN 2NF NO TRANSITIVELY DEPENDENT ATTRIBUTES ZIP CITY 75001 ADDISON 25343 JACKSON 53335 HARTLETT 12345 WAYNE 75243 DALLAS
  19. ONE TO ONE CARDINALITIES (RELATIONSHIPS) BELONGS TO (FK) / HAS

    ONE ONE TO MANY BELONGS TO (FK) / HAS MANY MANY TO MANY HAS MANY THROUGH / HAS MANY THROUGH
  20. RELATIONSHIPS ONE TO MANY RELATIONSHIPS COURSEID COURSE HALLID 101 LAW

    N 102 SCIENCE W 103 GEO S 104 COMP S HALLID HALL N NORTH W WEST S SOUTH E EAST
  21. RELATIONSHIPS MANY TO MANY RELATIONSHIPS COURSEID COURSE 101 LAW 102

    SCIENCE 103 GEO 104 COMP STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK
  22. RELATIONSHIPS RELATIONAL / INTERSECTION TABLE COURSEID COURSE 101 LAW 102

    SCIENCE 103 GEO 104 COMP STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK COURSEID STUDENTID 101 S1 102 S1 103 S3 104 S3
  23. INDEXES DATA OPTIMIZATIONS HIGHLY EFFICIENT COPY OF SELECT COLUMNS WHICH

    TYPICALLY INCLUDE DIRECT ADDRESSING STORED PROCEDURES ROUTINES RUN WITHIN THE DATABASE THAT CAN SOMETIMES MORE EFFICIENTLY MANIPULATE DATA OR REDUCE DB TRAFFIC VIEWS STORED QUERIES OF DATA WHICH CAN BE INDEXED SEPARATELY AND IN SOME DATABASES, THE RESULTING DATASET IS INDEXED AS A CLUSTER, DRAMATICALLY INCREASING PERFORMANCE
  24. INDEXES CREATE AN INDEX FOR ATTRIBUTES THAT YOU QUERY AGAINST

    THE MOST. INDEXES ARE UPDATED DYNAMICALLY, AND WILL RESULT IN DRAMATICALLY IMPROVED QUERIES.
  25. STORED PROCEDURES CREATE STORED PROCEDURES FOR STATICALLY CALLED SQL THAT

    MANIPULATES OR QUERIES LARGE DATASETS FOR SUBSETS VIEWS USE VIEWS FOR COMMON JOINS, WHICH MANY RDBMS’ KEEP IN THE A SEPARATE TABLE CACHE. THIS WILL ALLOW DIRECT QUERIES TO THE JOINED TABLE.
  26. PROCESS INITIAL ‘COMMIT’ ANTICIPATE WHAT THE COMMON QUERIES WILL BE

    ITERATE LOG SLOW QUERIES, RESOLVE AND KEEP OPTIMIZING
  27. SCHEMA-LESS? • NoSQL databases don’t have an internal schema •

    NoSQL databases are dynamically typed, meaning attributes can even have different types • Uses documents instead of records, which can be thought of as objects or arrays • Documents have no inherent schema • Typically no explicitly defined relationships • *** Schemas, constraints and types are all “soft.” They rely on programming or abstractions rather than “hard” in the DB. *** WHAT’S NOSQL?
  28. RELATIONSHIPS DOCUMENT SCHEMAS AREN’T DEFINED COURSEID COURSE COURSE LAW HALL

    NORTH STUDENTS JAKE, JANE, MAX DENORMALIZATION MAY INCLUDE DEP. OR TRANSITIVE INFORMATION (WRITE TO INCREASE EFFICIENCY OF READS)
  29. • Flexible documents • Fast iteration • More agile (yes

    the pm) • No schema “migrations” required • “Web Read” (JSON/BSON) • “Web Scale” (Performance) • “Black Box” (E.g. GIS) • Data fidelity • Soft constraints are programmatic • Most NoSQL not built for joins • Requires optimization of writes for reads • Refactoring, when necessary, is computationally intensive
  30. BRO, DO YOU EVEN WEBSCALE? • “Web Scale” - delivering

    scalable, consistent performance, especially through distributed reads. • Achieved through denormalization. • Relational data is kept in the document itself rather than separated out into separate tables.
  31. RDBS • Sophisticated joins • Fidelity of data is paramount

    • Complex relationships • Static data model • Multi-row transactions • Less sophisticated DBA • Can boost performance with followers (slaves) WHEN TO USE NOSQL • Complex data (E.g. geo-spatial) • Changing data model • Horizontal inconsistency not necessarily an issue • Some great features specifically for webapps (E.g. oplog in MongoDB) • Distributed read performance is critical • More sophisticated DBA willing to maintain, cleanse data and normalize types.
  32. BE CAREFUL WHAT YOU WISH FOR • More flexible does

    not mean simpler! • Structure and sanitization is app’s responsibility • Requires more effort from dev to ensure data fidelity NOSQL IS *NOT* SIMPLER OR EASIER
  33. NOSQL IS NOT EASIER • No right or wrong answer.

    • Not mutually exclusive - can be used together with great effect. • E.g. Comments system might be on NoSQL but your HR system might be on RDBS. • Key / Value stores like Redis = great object or query cache
  34. ABSTRACTIONS • API or library which obfuscates some of the

    raw querying or manipulation of the database. • Can allow for easy drop in of other database types (E.g. MySQL to Postgres) • Some add validation or schema support to databases that might not have them by default. • Some abstractions also offer data migrations, handling the table structure for you and can even build your original data model from an existing schema. ABSTRACTIONS EXAMPLES • Propel • Doctrine • Mongoose • Monk