Slide 1

Slide 1 text

EUGENE T. COOK LONE STAR PHP 2016 #LSP16 DATABASE THEORY, MODELS AND ABSTRACTIONS

Slide 2

Slide 2 text

A BIT ABOUT ME Transactional and Start Up Attorney MBA - IT and 
 Operations Management Polymathic Technologist JD - Rule of Law in
 Developing Nations Shameless Plugs

Slide 3

Slide 3 text

TWO PRIMARY TRENDS IN DB DESIGN SQL/SCHEMA NOSQL (SCHEMALESS)* MySQL MongoDB MariaDB RethinkDB PostgreSQL Redis Google Big Table * THERE ARE EXCEPTIONS TO MANY OF THE CHARACTERISTICS OF TYPICAL SCHEMALESS STORES

Slide 4

Slide 4 text

QUICK PRIMER DATA MODELING

Slide 5

Slide 5 text

DATA MODELING - STRUCTURE PROJECT SCOPE YOU’VE BEEN ASKED BY HEAD MISTRESS GERTRUDE TO CREATE A SYSTEM FOR HER NEW BOARDING SCHOOL. SHE WANTS TO BE ABLE TO TRACK CLASSES, THEIR STUDENTS AND ALSO KNOW THEIR LOCATION WITHIN THE SCHOOL!

Slide 6

Slide 6 text

MONOLITHIC TABLE COURSEID COURSE HALL MONITOR STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH SAM ERICA 102 SCIENCE WEST JOSEPH JOE JAKE ERICA 103 GEO SOUTH WILMA JANE MAX 104 COMP SOUTH WILMA CHUCK

Slide 7

Slide 7 text

THAT WAS EASY! ANY QUESTIONS?

Slide 8

Slide 8 text

GERTRUDE DOES NOT APPROVE DUPLICATION OF DATA MODIFICATION ANOMALIES NORMALIZE!!!!

Slide 9

Slide 9 text

I THOUGHT I WAS DONE

Slide 10

Slide 10 text

QUICK PRIMER DB MODELING

Slide 11

Slide 11 text

DATABASE MODEL DATABASE MODEL "A DATABASE MODEL IS A TYPE OF DATA MODEL THAT DETERMINES THE LOGICAL STRUCTURE OF A DATABASE AND FUNDAMENTALLY DETERMINES IN WHICH MANNER DATA CAN BE STORED, ORGANIZED, AND MANIPULATED.” WIKIPEDIA

Slide 12

Slide 12 text

CONCEPT • Identifying the distinct entities that comprise your data’s architecture • Conceptualizing your data objects • Identifying the relationships between the data objects DATABASE MODELING DESIGN • Conversion of the data objects into an actual schema • Naming convention of the data and relationships • Deciding what constraints you should have

Slide 13

Slide 13 text

MONOLITHIC TABLE COURSEID COURSE HALL MONITOR STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH SAM ERICA 102 SCIENCE WEST JOSEPH JOE JAKE ERICA 103 GEO SOUTH WILMA JANE MAX 104 COMP SOUTH WILMA CHUCK NORMALIZATION LOGICALLY ORGANIZING MODEL INTO TABLES AND COLUMNS CARDINALITY THE FUNDAMENTAL RELATION OF ONE TABLE TO ANOTHER

Slide 14

Slide 14 text

WHY NORMALIZE? MINIMIZE DUPLICATION MODIFICATION ANOMALIES (FIDELITY) SIMPLIFY QUERIES COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK

Slide 15

Slide 15 text

MODIFICATION ANOM. INSERT ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK ? EAST ?

Slide 16

Slide 16 text

UPDATE ANOMALY UPDATE ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK

Slide 17

Slide 17 text

FIDELITY ANOMALY FIDELITY ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH ERIKA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK

Slide 18

Slide 18 text

DELETE ANOMALY DELETE ANOMALY COURSEID COURSE HALL STUDENT1 STUDENT2 STUDENT3 101 LAW NORTH ERICA 102 SCIENCE WEST JOE JAKE ERICA 103 GEO SOUTH JANE MAX 104 COMP SOUTH CHUCK

Slide 19

Slide 19 text

NORMALIZATION COURSEID COURSE HALL 101 LAW NORTH 102 SCIENCE WEST 103 GEO SOUTH 104 COMP SOUTH FIRST NORMAL FORM - 1NF STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK PRIMARY KEY ONLY ATOMIC VALUES NO REPEATED COLUMNS

Slide 20

Slide 20 text

NORMALIZATION COURSEID COURSE HALL 101 LAW NORTH 102 SCIENCE WEST 103 GEO SOUTH 104 COMP SOUTH 1NF - WHAT DID WE GAIN? STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK UNLIMITED STUDENTS SIMPLER QUERY OF STUDENT FIXED STUDENT ANOMALIES

Slide 21

Slide 21 text

NORMALIZATION COURSEID COURSE 101 LAW 102 SCIENCE 103 GEO 104 COMP SECOND NORMAL FORM - 2NF STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK TABLE MUST BE 1NF ALL NON-KEY COLUMNS (ATTRIBUTES) ARE PK DEPENDENT HALLID HALL N NORTH W WEST S SOUTH E EAST

Slide 22

Slide 22 text

NORMALIZATION 2NF - WHAT DID WE GAIN? ANOMALIES HAVE BEEN RESOLVED COURSEID COURSE 101 LAW 102 SCIENCE 103 GEO 104 COMP STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK HALLID HALL N NORTH W WEST S SOUTH E EAST

Slide 23

Slide 23 text

NORMALIZATION WE NEED MORE DATA STUDENTID STUDENT CITY ZIP S1 ERICA ADDISON 75001 S2 JAKE JACKSON 25343 S3 MAX HARTLETT 53335 S4 JOE ADDISON 75001 S5 JANE WAYNE 12345 S6 CHUCK DALLAS 75243

Slide 24

Slide 24 text

GERTRUDE DOES NOT APPROVE I SENSE AN ANOMALY!!!! NORMALIZE!!!!

Slide 25

Slide 25 text

NORMALIZATION THIRD NORMAL FORM - 3NF STUDENTID STUDENT ZIP S1 ERICA 75001 S2 JAKE 25343 S3 MAX 53335 S4 JOE 75001 S5 JANE 12345 S6 CHUCK 75243 TABLE IS IN 2NF NO TRANSITIVELY DEPENDENT ATTRIBUTES ZIP CITY 75001 ADDISON 25343 JACKSON 53335 HARTLETT 12345 WAYNE 75243 DALLAS

Slide 26

Slide 26 text

BUT THE OBJECTS ARE LONELY!

Slide 27

Slide 27 text

ONE TO ONE CARDINALITIES (RELATIONSHIPS) BELONGS TO (FK) / HAS ONE ONE TO MANY BELONGS TO (FK) / HAS MANY MANY TO MANY HAS MANY THROUGH / HAS MANY THROUGH

Slide 28

Slide 28 text

RELATIONSHIPS ONE TO MANY RELATIONSHIPS COURSEID COURSE HALLID 101 LAW N 102 SCIENCE W 103 GEO S 104 COMP S HALLID HALL N NORTH W WEST S SOUTH E EAST

Slide 29

Slide 29 text

RELATIONSHIPS MANY TO MANY RELATIONSHIPS COURSEID COURSE 101 LAW 102 SCIENCE 103 GEO 104 COMP STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK

Slide 30

Slide 30 text

RELATIONSHIPS RELATIONAL / INTERSECTION TABLE COURSEID COURSE 101 LAW 102 SCIENCE 103 GEO 104 COMP STUDENTID STUDENT S1 ERICA S2 JAKE S3 MAX S4 JOE S5 JANE S6 CHUCK COURSEID STUDENTID 101 S1 102 S1 103 S3 104 S3

Slide 31

Slide 31 text

VISUALIZING THE RELATIONSHIPS

Slide 32

Slide 32 text

QUICK PRIMER OPTIMIZATION

Slide 33

Slide 33 text

INDEXES DATA OPTIMIZATIONS HIGHLY EFFICIENT COPY OF SELECT COLUMNS WHICH TYPICALLY INCLUDE DIRECT ADDRESSING STORED PROCEDURES ROUTINES RUN WITHIN THE DATABASE THAT CAN SOMETIMES MORE EFFICIENTLY MANIPULATE DATA OR REDUCE DB TRAFFIC VIEWS STORED QUERIES OF DATA WHICH CAN BE INDEXED SEPARATELY AND IN SOME DATABASES, THE RESULTING DATASET IS INDEXED AS A CLUSTER, DRAMATICALLY INCREASING PERFORMANCE

Slide 34

Slide 34 text

INDEXES CREATE AN INDEX FOR ATTRIBUTES THAT YOU QUERY AGAINST THE MOST. INDEXES ARE UPDATED DYNAMICALLY, AND WILL RESULT IN DRAMATICALLY IMPROVED QUERIES.

Slide 35

Slide 35 text

STORED PROCEDURES CREATE STORED PROCEDURES FOR STATICALLY CALLED SQL THAT MANIPULATES OR QUERIES LARGE DATASETS FOR SUBSETS VIEWS USE VIEWS FOR COMMON JOINS, WHICH MANY RDBMS’ KEEP IN THE A SEPARATE TABLE CACHE. THIS WILL ALLOW DIRECT QUERIES TO THE JOINED TABLE.

Slide 36

Slide 36 text

PROCESS INITIAL ‘COMMIT’ ANTICIPATE WHAT THE COMMON QUERIES WILL BE ITERATE LOG SLOW QUERIES, RESOLVE AND KEEP OPTIMIZING

Slide 37

Slide 37 text

QUICK PRIMER NOSQL

Slide 38

Slide 38 text

SCHEMA-LESS? • NoSQL databases don’t have an internal schema • NoSQL databases are dynamically typed, meaning attributes can even have different types • Uses documents instead of records, which can be thought of as objects or arrays • Documents have no inherent schema • Typically no explicitly defined relationships • *** Schemas, constraints and types are all “soft.” They rely on programming or abstractions rather than “hard” in the DB. *** WHAT’S NOSQL?

Slide 39

Slide 39 text

RELATIONSHIPS DOCUMENT SCHEMAS AREN’T DEFINED COURSEID COURSE COURSE LAW HALL NORTH STUDENTS JAKE, JANE, MAX DENORMALIZATION MAY INCLUDE DEP. OR TRANSITIVE INFORMATION (WRITE TO INCREASE EFFICIENCY OF READS)

Slide 40

Slide 40 text

• Flexible documents • Fast iteration • More agile (yes the pm) • No schema “migrations” required • “Web Read” (JSON/BSON) • “Web Scale” (Performance) • “Black Box” (E.g. GIS) • Data fidelity • Soft constraints are programmatic • Most NoSQL not built for joins • Requires optimization of writes for reads • Refactoring, when necessary, is computationally intensive

Slide 41

Slide 41 text

WEB SCALE11!!1!1!ELEVENTYONE!

Slide 42

Slide 42 text

BRO, DO YOU EVEN WEBSCALE? • “Web Scale” - delivering scalable, consistent performance, especially through distributed reads. • Achieved through denormalization. • Relational data is kept in the document itself rather than separated out into separate tables.

Slide 43

Slide 43 text

DENORMALIZATION? YOU JUST WASTED THE FIRST 20 MINS.

Slide 44

Slide 44 text

RDBS • Sophisticated joins • Fidelity of data is paramount • Complex relationships • Static data model • Multi-row transactions • Less sophisticated DBA • Can boost performance with followers (slaves) WHEN TO USE NOSQL • Complex data (E.g. geo-spatial) • Changing data model • Horizontal inconsistency not necessarily an issue • Some great features specifically for webapps (E.g. oplog in MongoDB) • Distributed read performance is critical • More sophisticated DBA willing to maintain, cleanse data and normalize types.

Slide 45

Slide 45 text

!!DANGER!!DANGER!!DANGER!!

Slide 46

Slide 46 text

BE CAREFUL WHAT YOU WISH FOR • More flexible does not mean simpler! • Structure and sanitization is app’s responsibility • Requires more effort from dev to ensure data fidelity NOSQL IS *NOT* SIMPLER OR EASIER

Slide 47

Slide 47 text

;)/*+4'#-)'7)654505+"+)*()<'B45&'( ) ,RS+T!8EUVW!X!M#!6'D#!2E!UVW! ! ,RY+T!8EUVW!X!Z2E3!UVW!! *+++T!8EUVW!X!8E!UVW[!! *++\T!8EUVW!X!8E9!E2G(!UVW!! *+,-T!8EUVW!X!8E)!UVW[! N]O^=N.UO ! MARK MADSEN

Slide 48

Slide 48 text

NOSQL IS NOT EASIER • No right or wrong answer. • Not mutually exclusive - can be used together with great effect. • E.g. Comments system might be on NoSQL but your HR system might be on RDBS. • Key / Value stores like Redis = great object or query cache

Slide 49

Slide 49 text

RECOMMENDATION POSTGRES?

Slide 50

Slide 50 text

QUICK PRIMER ABSTRACTIONS

Slide 51

Slide 51 text

ABSTRACTIONS • API or library which obfuscates some of the raw querying or manipulation of the database. • Can allow for easy drop in of other database types (E.g. MySQL to Postgres) • Some add validation or schema support to databases that might not have them by default. • Some abstractions also offer data migrations, handling the table structure for you and can even build your original data model from an existing schema. ABSTRACTIONS EXAMPLES • Propel • Doctrine • PDO • Mongoose • Monk

Slide 52

Slide 52 text

LOOK FORWARD DON’T BE AFRAID TO BROADEN YOUR HORIZONS YOUR APP WILL THANK YOU

Slide 53

Slide 53 text

KEEP IN TOUCH @ETC ECOOK [email protected]

Slide 54

Slide 54 text

QUESTIONS?

Slide 55

Slide 55 text

THANK YOU