Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PHPCon Tokyo - 2015 - Database Theory Models and Abstractions

E.T.Cook
October 03, 2015

PHPCon Tokyo - 2015 - Database Theory Models and Abstractions

Proper database theory and design seems to be an increasingly lost art. As abstraction layers, migration tools and ORM become more prevalent, less focus is being spent on properly architecting the database, relationships, indexes and constraints. As a result, many projects suffer a crisis of data or scalability as data sets grow.

In this workshop we'll cover the following: Table relationships and relationship tables Database Normalization with a focus on Third Normal Form (3NF) Database Indexes and Use Cases Database Keys and Composite Keys Database Optimization The difference between schema and schema-less databases The selection of the proper database to suit your requirements (hint: it may not just be one) The current DB landscape and use cases, including Redis, MongoDB, MySQL/MariaDB and Memcache (hint: some frameworks even use relational databases as a key-value store)

E.T.Cook

October 03, 2015
Tweet

More Decks by E.T.Cook

Other Decks in Technology

Transcript

  1. A BIT ABOUT ME Transactional and Start Up Attorney MBA

    - IT and 
 Operations Management Polymathic Technologist JD - Rule of Law in
 Developing Nations Shameless Plugs
  2. TWO PRIMARY TRENDS IN DB DESIGN SCHEMA NOSQL (SCHEMALESS)* MySQL

    MongoDB MariaDB RethinkDB PostgreSQL Redis Google Big Table * THERE ARE EXCEPTIONS TO MANY OF THE CHARACTERISTICS OF TYPICAL SCHEMALESS STORES
  3. DATA MODELING • Identifying the distinct entities that comprise your

    data’s architecture • Conceptualizing your data objects • Identifying the relationships between the data objects DATABASE ARCHITECTURE DATABASE DESIGN • Conversion of the data objects into an actual schema • Naming convention of the data and relationships • Deciding what constraints you should have
  4. DATA MODELING - STRUCTURE PROJECT SCOPE YOU’VE BEEN ASKED BY

    HEAD MISTRESS GERTRUDE TO CREATE A SYSTEM FOR HER NEW BOARDING SCHOOL. SHE WANTS TO BE ABLE TO TRACK STUDENT SCHEDULES AND ALSO THEIR FINANCIAL ACCOUNT STATUS.
  5. MONOLITHIC STUDENT CLASS 1 GRADE 1 CLASS 2 GRADE 2

    DUE LAST PAY JOE LAW A SOCIAL B 500 1 MALCOLM SCIENCE B MATH A 250 1 MINDY GEO C SOCIAL D 350 2 JACK COMP A LAW A 600 12
  6. GERTRUDE DOES NOT APPROVE WHAT ABOUT NEXT SEMESTER? WHAT IF

    THEY TAKE MORE THAN TWO CLASSES? HOW DO WE SAVE MORE INFORMATION ABOUT THE CLASSES?
  7. MONOLITHIC STUDENT CLASS 1 GRADE 1 CLASS 2 GRADE 2

    DUE LAST PAY JOE LAW A SOCIAL B 500 1 MALCOLM SCIENCE B MATH A 250 1 MINDY GEO C SOCIAL D 350 2 JACK COMP A LAW A 600 12 NORMALIZATION ORGANIZATION TABLES AND COLUMNS TO MINIMIZE REDUNDANCY CARDINALITY THE FUNDAMENTAL RELATION OF ONE TABLE TO ANOTHER
  8. NORMALIZATION CLASS 1 GRADE 1 CLASS 2 GRADE 2 DUE

    LAST PAY JOE LAW A SOCIAL B 500 1 MALCOLM SCIENCE B MATH A 250 1 MINDY GEO C SOCIAL D 350 2 JACK COMP A LAW A 600 12 WHAT ABOUT CLASS META INFORMATION? WHAT IF WE CHANGED THE CLASS NAME?
  9. NORMALIZATION STUDENT CLASS 1 GRADE 1 CLASS 2 GRADE 2

    DUE LAST PAY JOE LAW A SOCIAL B 500 1 MALCOLM SCIENCE B MATH A 250 1 MINDY GEO C SOCIAL D 350 2 JACK COMP A LAW A 600 12 CLASS DESC TEACHER LAW … SUSAN SCIENCE … RANDALL GEO … CRAIG COMP … SUSAN WE’VE ADDED META, BUT NOW WHAT?
  10. ONE TO ONE CARDINALITIES (RELATIONSHIPS) BELONGS TO (FK) / HAS

    ONE ONE TO MANY BELONGS TO (FK) / HAS MANY MANY TO MANY HAS MANY THROUGH / HAS MANY THROUGH
  11. TEACHERS CLASSES IDENTIFY THE RELATIONSHIPS CAN A CLASS HAVE MULTIPLE

    TEACHERS? CAN A TEACHER TEACH MULTIPLE CLASSES? M - 1
  12. ONE TO MANY STUDENT CLASS 1 GRADE 1 CLASS 2

    GRADE 2 DUE LAST PAY JOE 1 A 5 B 500 1 MALCOLM 2 A 6 A 250 1 MINDY 3 C 5 D 350 2 JACK 4 A 1 A 600 12 CLASS NAME DESC TEACHER 1 LAW … 1 2 SCIENCE … 2 3 GEO … 3 4 COMP … 1 TEACHER NAME HIRE DATE 1 SUSAN … 2 RANDALL … 3 CRAIG …
  13. CLASSES STUDENTS IDENTIFY THE RELATIONSHIPS CAN A STUDENT BE IN

    MULTIPLE CLASSES? CAN A CLASS HAVE MULTIPLE STUDENTS? M - M
  14. MANY TO MANY STUDENT NAME 1 JOE 2 MALCOLM 3

    MINDY 4 JACK CLASS NAME 1 MATH 2 SCIENCE 3 GYM 4 LANGUAGE ENROLLED_IN STUDENT_ID CLASS_ID 1 1 2 2 1 3 3 4 1 4 2 4 5 3 1 6 2 1
  15. INDEXES DATA OPTIMIZATIONS HIGHLY EFFICIENT COPY OF SELECT COLUMNS WHICH

    TYPICALLY INCLUDE DIRECT ADDRESSING STORED PROCEDURES ROUTINES RUN WITHIN THE DATABASE THAT CAN SOMETIMES MORE EFFICIENTLY MANIPULATE DATA OR REDUCE DB TRAFFIC VIEWS STORED QUERIES OF DATA WHICH CAN BE INDEXED SEPARATELY AND IN SOME DATABASES, THE RESULTING DATASET IS INDEXED AS A CLUSTER, DRAMATICALLY INCREASING PERFORMANCE
  16. INDEXES CREATE AN INDEX FOR ATTRIBUTES THAT YOU QUERY AGAINST

    THE MOST. INDEXES ARE UPDATED DYNAMICALLY, AND WILL RESULT IN DRAMATICALLY IMPROVED QUERIES.
  17. DATA MODELING • Identifying the distinct entities that comprise your

    data’s architecture • Conceptualizing your data objects • Identifying the relationships between the data objects DATABASE ARCHITECTURE DATABASE DESIGN • Conversion of the data objects into an actual schema • Naming convention of the data and relationships • Deciding what constraints you should have
  18. STORED PROCEDURES CREATE STORED PROCEDURES FOR STATICALLY CALLED SQL THAT

    MANIPULATES OR QUERIES LARGE DATASETS FOR SUBSETS VIEWS USE VIEWS FOR COMMON JOINS, WHICH MANY RDBS’ KEEP IN THE A SEPARATE TABLE CACHE. THIS WILL ALLOW DIRECT QUERIES TO THE JOINED TABLE.
  19. SCHEMA-LESS? • NoSQL databases don’t have an internal schema •

    NoSQL databases are dynamically typed, meaning attributes can even have different types • Uses documents instead of records, which can be thought of as objects or arrays • Documents have no inherent schema • Schemas, constraints and types are all “soft.” They rely on programming or abstractions rather than “hard” in the DB. WHAT’S NOSQL?
  20. • Flexible documents • Fast iteration • More agile •

    No “migrations” required • “Web Read” (JSON/BSON) • “Web Scale” (Performance) • Data fidelity • Soft constraints are programmatic • Most NoSQL not built for joins • Requires optimization of writes for reads
  21. BRO, DO YOU EVEN WEBSCALE? • “Web Scale” - delivering

    scalable, consistent performance, especially through distributed reads. • Achieved through denormalization. • Relational data is kept in the document itself rather than separated out into separate tables.
  22. RDBS • Sophisticated joins • Fidelity of data is paramount

    • Complex relationships • Static data model • Multi-row transactions • Less sophisticated DBA • Can boost performance with followers (slaves) WHEN TO USE NOSQL • Complex data (E.g. geo-spatial) • Changing data model • Horizontal inconsistency not necessarily an issue • Some great features specifically for webapps (E.g. oplog in MongoDB) • Distributed read performance is critical • More sophisticated DBA willing to maintain, cleanse data and normalize types.
  23. WHAT’S THE POINT? • No right or wrong answer. •

    Not mutually exclusive - can be used together with great effect. • E.g. Comments system might be on NoSQL but your HR system might be on RDBS.
  24. ABSTRACTIONS • API or library which obfuscates some of the

    raw querying or manipulation of the database. • Can allow for easy drop in of other database types (E.g. MySQL to Postgres) • Some add validation or schema support to databases that might not have them by default. • Some abstractions also offer data migrations, handling the table structure for you and can even build your original data model from an existing schema. ABSTRACTIONS EXAMPLES • Propel • Doctrine • PDO • Mongoose • Monk