Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data - 02 - Lessons Learnt from the past

Big Data - 02 - Lessons Learnt from the past

Lecture given at ETH Zurich on September 20, 2017.

66d5abafc597b670cf6f109e4c278ebc?s=128

Ghislain Fourny

September 20, 2017
Tweet

Transcript

  1. Ghislain Fourny Big Data 2. Lessons learnt from the past

  2. 2 Mr. Databases: Edgar Codd Wikipedia

  3. 3 Data Independence (Edgar Codd) Lorem Ipsum Dolor sit amet

    Consectetur Adipiscing Elit. In Imperdiet Ipsum ante Physical storage Logical data model
  4. 4 Data Independence (Edgar Codd) Physical storage Logical data model

  5. 5 Data Independence (Edgar Codd) Physical storage Logical data model

  6. 6 Data Shapes Lorem ipsum dolor sit amet, consectetur adipiscing

    elit. Etiam vel erat nec dui aliquet vulputate sed quis nulla. Donec eget ultricies magna, eu dignissim elit. Nullam sed urna nec nisl rhoncus ullamcorper placerat et enim. Integer varius ornare libero quis consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean eu efficitur orci. Aenean ac posuere tellus. Ut id commodo turpis. Praesent nec libero metus. Praesent at turpis placerat, congue ipsum eget, scelerisque justo. Ut volutpat, massa ac lacinia cursus, nisl dui volutpat arcu, quis interdum sapien turpis in tellus. Suspendisse potenti. Vestibulum pharetra justo massa, ac venenatis mi condimentum nec. Proin viverra tortor non orci suscipit rutrum. Phasellus sit amet euismod diam. Nullam convallis nunc sit amet diam suscipit dapibus. Integer porta hendrerit nunc. Quisque pharetra congue porta. Suspendisse vestibulum sed mi in euismod. Etiam a purus suscipit, accumsan nibh vel, posuere ipsum. Nulla nec tempor nibh, id venenatis lectus. Duis lobortis id urna eget tincidunt.
  7. 7 Data Shapes Lorem ipsum dolor sit amet, consectetur adipiscing

    elit. Etiam vel erat nec dui aliquet vulputate sed quis nulla. Donec eget ultricies magna, eu dignissim elit. Nullam sed urna nec nisl rhoncus ullamcorper placerat et enim. Integer varius ornare libero quis consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean eu efficitur orci. Aenean ac posuere tellus. Ut id commodo turpis. Praesent nec libero metus. Praesent at turpis placerat, congue ipsum eget, scelerisque justo. Ut volutpat, massa ac lacinia cursus, nisl dui volutpat arcu, quis interdum sapien turpis in tellus. Suspendisse potenti. Vestibulum pharetra justo massa, ac venenatis mi condimentum nec. Proin viverra tortor non orci suscipit rutrum. Phasellus sit amet euismod diam. Nullam convallis nunc sit amet diam suscipit dapibus. Integer porta hendrerit nunc. Quisque pharetra congue porta. Suspendisse vestibulum sed mi in euismod. Etiam a purus suscipit, accumsan nibh vel, posuere ipsum. Nulla nec tempor nibh, id venenatis lectus. Duis lobortis id urna eget tincidunt.
  8. 8 Take-away Concepts

  9. 9 Take-away Concepts Table

  10. 10 Take-away Concepts Table Collection

  11. 11 Take-away Concepts Attribute

  12. 12 Take-away Concepts Attribute Column Field Property

  13. 13 Take-away Concepts Primary Key

  14. 14 Take-away Concepts Primary Key Row ID Name

  15. 15 Take-away Concepts Row

  16. 16 Take-away Concepts Row Business Object Item Entity Document Record

  17. 17 Relational Algebra

  18. 18 Function ∈ → ℬ ∈ ℬ or ℬ ↦

  19. 19 Partial function ∈ ↛ ℬ ℬ ↦

  20. 20 ℬ Relation ∈ (×ℬ) ⊆ ×ℬ or

  21. 21 Relation as a list

  22. 22 Relation as a list, n-way

  23. 23 Relations seen from set theory Given a family of

    sets . /0.01 A relation is a subset of their Cartesian product ⊆ 2 3 = / ×5 × ⋯× 1 37/ 1
  24. 24 Relation as a list, n-way

  25. 25 Relational table A B C

  26. 26 Relational Algebra

  27. 27 Selection Projection Filter queries Cartesian product Natural join Theta

    join Binary queries Relation renaming Attribute renaming Renaming queries Union Intersection Substraction Set queries Summary of relational queries
  28. 28 Selection

  29. 29 Selection queries: duplicates foo 1 true bar 2 false

    foo 3 false foobar 4 true A B C string integer boolean foo 1 true bar 2 false A B C string integer boolean = :05 () R S
  30. 30 Projection

  31. 31 Projection foo 1 true bar 2 false foo 3

    false foobar 4 true A B C string integer boolean foo true bar false foo false foobar true A C string boolean = =,? () R S
  32. 32 Grouping A B C D

  33. 33 Grouping foo 19 bar 28 bar 265 foo 4

    foobar 54 foo 46 bar 245 foobar 3456 bar 139 G A string integer R G A string integer R foo 19 foo 4 foo 46 bar 28 bar 265 bar 245 bar 139 foobar 54 foobar 3456 G A string integer R foo 19 4 46 bar 28 265 245 139 foobar 54 3456 G A string integer R foo 69 bar 677 foobar 3510
  34. 34 Sorting 1 2 3 4 5

  35. 35 Cartesian Product

  36. 36 Cartesian product foo 1 true bar 2 false A

    B C string integer boolean R foo 1 bar 2 foo 3 D E string integer S foo 1 true foo 1 true foo 1 true bar 2 false bar 2 false bar 2 false A B C D E string integer boolean string integer T = × foo 1 bar 2 foo 3 foo 1 bar 2 foo 3
  37. 37 Joining A A B A B A B A

  38. 38 Join foo 1 true bar 2 false A B

    C string integer boolean R bar 1 bar 2 foo 3 D B string integer S foo 1 true bar 2 false A B C D string integer boolean string T = ⨝ bar bar
  39. 39 Normal Forms

  40. 40 From your Bachelor: Normal Forms Consistency Update anomaly Delete

    anomaly Insert anomaly
  41. 41 1st Normal Form (tabular) – The Key

  42. 42 1st Normal Form: counter-example Legi Name Lecture City State

    PLZ 32-000-000 Alan Turing Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor Pfäffikon SZ 8808 25-000-000 Felix Bloch Pfäffikon ZH 8330 Lecture ID Lecture Name xxx-xxxx-xxX Cryptography 263-3010-00L Big Data Lecture ID Lecture Name 263-3010-00L Big Data 123-4567-89L Set theory Lecture ID Lecture Name 123-4567-89L Set theory
  43. 43 1st Normal Form Legi Name Lecture ID Lecture Name

    City State PLZ 32-000-000 Alan Turing xxx-xxxx-xxX Cryptography Bletchley Park UK MK3 6EB 32-000-000 Alan Turing 263-3010-00L Big Data Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor 263-3010-00L Big Data Pfäffikon SZ 8808 62-000-000 Georg Cantor 123-4567-89L Set theory Pfäffikon SZ 8808 25-000-000 Felix Bloch 123-4567-89L Set theory Pfäffikon ZH 8330
  44. 44 2nd Normal Form (not joined) – The Whole Key

  45. 45 2nd Normal Form: Counter-example Legi Name Lecture ID Lecture

    Name City State PLZ 32-000-000 Alan Turing xxx-xxxx-xxX Cryptography Bletchley Park UK MK3 6EB 32-000-000 Alan Turing 263-3010-00L Big Data Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor 263-3010-00L Big Data Pfäffikon SZ 8808 62-000-000 Georg Cantor 123-4567-89L Set theory Pfäffikon SZ 8808 25-000-000 Felix Bloch 123-4567-89L Set theory Pfäffikon ZH 8330
  46. 46 2nd Normal Form: Example Legi Name City State PLZ

    32-000-000 Alan Turing Bletchley Park UK MK3 6EB 32-000-000 Alan Turing Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor Pfäffikon SZ 8808 62-000-000 Georg Cantor Pfäffikon SZ 8808 25-000-000 Felix Bloch Pfäffikon ZH 8330 Lecture ID Lecture Name xxx-xxxx-xxX Cryptography 263-3010-00L Big Data 123-4567-89L Set theory Legi Lecture ID 32-000-000 xxx-xxxx-xxX 32-000-000 263-3010-00L 62-000-000 263-3010-00L 62-000-000 123-4567-89L 25-000-000 123-4567-89L
  47. 47 3rd Normal Form – Nothing But The Key

  48. 48 3rd Normal Form: Counter-Example Legi 32-000-000 32-000-000 62-000-000 62-000-000

    25-000-000 Name City State PLZ Alan Turing Bletchley Park UK MK3 6EB Alan Turing Bletchley Park UK MK3 6EB Georg Cantor Pfäffikon SZ 8808 Georg Cantor Pfäffikon SZ 8808 Felix Bloch Pfäffikon ZH 8330
  49. 49 3rd Normal Form: Example Legi 32-000-000 32-000-000 62-000-000 62-000-000

    25-000-000 Name City State Alan Turing Bletchley Park UK Alan Turing Bletchley Park UK Georg Cantor Pfäffikon SZ Georg Cantor Pfäffikon SZ Felix Bloch Pfäffikon ZH City State PLZ Bletchley Park UK MK3 6EB Bletchley Park UK MK3 6EB Pfäffikon SZ 8808 Pfäffikon SZ 8808 Pfäffikon ZH 8330
  50. 50 Data Denormalization 3NF 2NF

  51. 51 Data Denormalization 3NF 1NF

  52. 52 Data Denormalization 3NF 0NF

  53. 53 SQL Brush-Up

  54. 54 SQL History Don Chamberlin Raymond Boyce

  55. 55 The early days (early 1970s) System R + SEQUEL

    Almaden (San Jose) First commercial relational database First commercial relational query language First customer (1977)
  56. 56 SEQUEL Structured English QUEry Language Declarative language Set-based (Manipulates

    entire relations with a single command)
  57. 57 Renaming SEQUEL SQL (Trademark issue) ESS-kew-EL SEE-kwəl or

  58. 58 SQL is a declarative language Physical execution Logical model

    "What, not how"
  59. 59 SQL is a declarative language Query Plan Logical model

    "What, not how"
  60. 60 SQL is a declarative language Parallelism Logical model "What,

    not how"
  61. 61 SQL is a functional language "Expressions that nest"

  62. 62 The simplest query (almost idempotent) SELECT * FROM persons

    name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 The output relation has no name
  63. 63 A selecting query: WHERE SELECT * FROM persons WHERE

    last_name = 'Crusher' name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea Beverly C Crusher 2324-10-13 FALSE AD234F7 persons
  64. 64 A projecting query SELECT name, birth_date FROM persons name

    middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name birth_date varchar(30) date James 2233-03-22 Beverly 2324-10-13 Spock 2230-01-06
  65. 65 A projecting query SELECT DISTINCT gender, birth_date FROM persons

    name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2233-03-22 TRUE AD234F7 persons gender birth_date boolean date TRUE 2233-03-22 FALSE 2324-10-13
  66. 66 A renaming query SELECT name AS who, birth_date AS

    when FROM persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons who when varchar(30) date James 2233-03-22 Beverly 2324-10-13 Spock 2230-01-06
  67. 67 A sorting query: ORDER BY SELECT * FROM persons

    ORDER BY birth_date name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea Spock NULL NULL 2230-01-06 TRUE AD234F7 James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 persons
  68. 68 Sorting options: ascending (default) SELECT * FROM persons ORDER

    BY birth_date ASC name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea Spock NULL NULL 2230-01-06 TRUE AD234F7 James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 persons
  69. 69 Sorting options: descending SELECT * FROM persons ORDER BY

    birth_date DESC name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea Beverly C Crusher 2324-10-13 FALSE AD234F7 James T Kirk 2233-03-22 TRUE AD10E7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons
  70. 70 Sorting options: NULLs first SELECT * FROM persons ORDER

    BY last_name NULLS FIRST name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea James T Kirk 2233-03-22 TRUE AD10E7 Beverly C Crusher 2324-10-13 FALSE AD234F7 Spock NULL NULL 2230-01-06 TRUE AD234F7 persons name middle_initial last_name birth_date gender passport_scan varchar(30) char(1) text date boolean bytea Spock NULL NULL 2230-01-06 TRUE AD234F7 Beverly C Crusher 2324-10-13 FALSE AD234F7 James T Kirk 2233-03-22 TRUE AD10E7 persons
  71. 71 A grouping query: GROUP BY SELECT century, COUNT(*) AS

    count FROM persons GROUP BY century name middle_initial last_name century gender varchar(30) char(1) text integer boolean James T Kirk 23 TRUE Beverly C Crusher 24 FALSE Spock NULL NULL 23 TRUE Kathryn NULL Janeway 24 FALSE persons century count integer bigint 23 2 24 2 Grouping column
  72. 72 Aggregation operators MAX MIN SUM AVG COUNT The most

    common BOOL_AND/EVERY BOOL_OR STRING_AGG Advanced aggregations
  73. 73 Post-aggregation selection: HAVING SELECT century AS c FROM persons

    GROUP BY century HAVING COUNT(*) > 2 name middle_initial last_name century captain varchar(30) char(1) text integer boolean James T Kirk 23 TRUE Beverly C Crusher 24 FALSE Jean-Luc NULL Picard 24 TRUE Kathryn NULL Janeway 24 TRUE persons century integer 24
  74. 74 Semi-outer joins: left SELECT * FROM persons LEFT OUTER

    JOIN spaceships ON persons.name = spaceships.captain_name name middle_initial last_name captain century warp spaceship_name captain_name code varchar(30) char(1) text boolean integer numeric varchar(30) text integer James T Kirk TRUE 23 5 USS Enterprise A Kirk NCC-1701-A James T Kirk FALSE 24 4 USS Enterprise Kirk NCC-1701 Beverly C Crusher 24 FALSE NULL NULL NULL NULL Jean-Luc NULL Picard TRUE 24 9.2 USS Enterprise D Picard NCC-1701-D Kathryn NULL Janeway TRUE 24 9.975 USS Voyager Janeway NCC-74656 name middle_initial last_name century captain varchar(30) char(1) text integer boolean James T Kirk 23 TRUE Beverly C Crusher 24 FALSE Jean-Luc NULL Picard 24 TRUE Kathryn NULL Janeway 24 TRUE persons warp spaceship_name captain_name code numeric varchar(30) text integer 5 USS Enterprise A Kirk NCC-1701-A 4 USS Enterprise Kirk NCC-1701 9.2 USS Enterprise D Picard NCC-1701-D 9.975 USS Voyager Janeway NCC-74656 spaceships
  75. 75 Semi-outer joins: right SELECT * FROM persons RIGHT OUTER

    JOIN spaceships ON persons.name = spaceships.captain_name name middle_initial last_name captain century warp spaceship_name captain_name code varchar(30) char(1) text boolean integer numeric varchar(30) text integer James T Kirk TRUE 23 5 USS Enterprise A Kirk NCC-1701-A James T Kirk FALSE 24 4 USS Enterprise Kirk NCC-1701 NULL NULL NULL NULL NULL 9.2 USS Enterprise D Picard NCC-1701-D Jean-Luc NULL Picard TRUE 24 9.2 USS Enterprise D Picard NCC-1701-D Kathryn NULL Janeway TRUE 24 9.975 USS Voyager Janeway NCC-74656 name middle_initial last_name century captain varchar(30) char(1) text integer boolean James T Kirk 23 TRUE Beverly C Crusher 24 FALSE Kathryn NULL Janeway 24 TRUE persons warp spaceship_name captain_name code numeric varchar(30) text integer 5 USS Enterprise A Kirk NCC-1701-A 4 USS Enterprise Kirk NCC-1701 9.2 USS Enterprise D Picard NCC-1701-D 9.975 USS Voyager Janeway NCC-74656 spaceships
  76. 76 Full outer joins SELECT * FROM persons FULL OUTER

    JOIN spaceships ON persons.name = spaceships.captain_name name middle_initial last_name century captain varchar(30) char(1) text integer boolean James T Kirk 23 TRUE Beverly C Crusher 24 FALSE Kathryn NULL Janeway 24 TRUE persons warp spaceship_name captain_name code numeric varchar(30) text integer 5 USS Enterprise A Kirk NCC-1701-A 4 USS Enterprise Kirk NCC-1701 9.2 USS Enterprise D Picard NCC-1701-D 9.975 USS Voyager Janeway NCC-74656 spaceships name middle_initial last_name captain century warp spaceship_name captain_name code varchar(30) char(1) text boolean integer numeric varchar(30) text integer James T Kirk TRUE 23 5 USS Enterprise A Kirk NCC-1701-A James T Kirk FALSE 24 4 USS Enterprise Kirk NCC-1701 NULL NULL NULL NULL NULL 9.2 USS Enterprise D Picard NCC-1701-D Beverly C Crusher 24 FALSE NULL NULL NULL NULL Kathryn NULL Janeway TRUE 24 9.975 USS Voyager Janeway NCC-74656
  77. 77 Natural join SELECT * FROM persons NATURAL FULL OUTER

    JOIN spaceships name middle_initial last_name century captain varchar(30) char(1) text integer boolean James T Kirk 23 TRUE Beverly C Crusher 24 FALSE Kathryn NULL Janeway 24 TRUE persons warp spaceship_name last_name code numeric varchar(30) text integer 5 USS Enterprise A Kirk NCC-1701-A 4 USS Enterprise Kirk NCC-1701 9.2 USS Enterprise D Picard NCC-1701-D 9.975 USS Voyager Janeway NCC-74656 spaceships name middle_initial last_name captain century warp spaceship_name code varchar(30) char(1) text boolean integer numeric varchar(30) integer James T Kirk TRUE 23 5 USS Enterprise A NCC-1701-A James T Kirk FALSE 24 4 USS Enterprise NCC-1701 NULL NULL NULL NULL NULL 9.2 USS Enterprise D NCC-1701-D Beverly C Crusher 24 FALSE NULL NULL NULL Kathryn NULL Janeway TRUE 24 9.975 USS Voyager NCC-74656
  78. 78 WHERE vs. HAVING: query plan WHERE GROUP HAVING SELECT

    FROM Pre-grouping Post-grouping σ σ π
  79. 79 A union query: UNION spaceships1 warp name last_name code

    successor numeric varchar(30) text varchar varchar 5 USS Enterprise A Kirk NCC-1701-A NCC-1701-B 6 USS Enterprise B Kirk NCC-1701-B NCC-1701-C 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D spaceships2 warp name last_name code successor numeric varchar(30) text varchar varchar 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D 4 USS Enterprise Kirk NCC-1701 NCC-1701-A 9.2 USS Enterprise D Picard NCC-1701-D NCC-1701-E warp name last_name code successor numeric varchar(30) text varchar varchar 5 USS Enterprise A Kirk NCC-1701-A NCC-1701-B 6 USS Enterprise B Kirk NCC-1701-B NCC-1701-C 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D 4 USS Enterprise Kirk NCC-1701 NCC-1701-A 9.2 USS Enterprise D Picard NCC-1701-D NCC-1701-E SELECT * FROM spaceships1 UNION spaceships2 duplicate elimination
  80. 80 An intersection query: INTERSECT spaceships1 warp name last_name code

    successor numeric varchar(30) text varchar varchar 5 USS Enterprise A Kirk NCC-1701-A NCC-1701-B 6 USS Enterprise B Kirk NCC-1701-B NCC-1701-C 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D spaceships2 warp name last_name code successor numeric varchar(30) text varchar varchar 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D 4 USS Enterprise Kirk NCC-1701 NCC-1701-A warp name last_name code successor numeric varchar(30) text varchar varchar 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D SELECT * FROM spaceships1 INTERSECT spaceships2 duplicate elimination
  81. 81 An set substraction query: EXCEPT spaceships1 warp name last_name

    code successor numeric varchar(30) text varchar varchar 5 USS Enterprise A Kirk NCC-1701-A NCC-1701-B 6 USS Enterprise B Kirk NCC-1701-B NCC-1701-C 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D spaceships2 warp name last_name code successor numeric varchar(30) text varchar varchar 7 USS Enterprise C Kirk NCC-1701-C NCC-1701-D 4 USS Enterprise Kirk NCC-1701 NCC-1701-A warp name last_name code successor numeric varchar(30) text varchar varchar 5 USS Enterprise A Kirk NCC-1701-A NCC-1701-B 6 USS Enterprise B Kirk NCC-1701-B NCC-1701-C SELECT * FROM spaceships1 EXCEPT spaceships2 duplicate elimination
  82. 82 Nesting SELECT pname, warp FROM ( SELECT persons.name AS

    pname, last_name, century, spaceships.name AS sname, warp FROM persons FULL OUTER JOIN spaceships USING last_name ) Table expression Table expression
  83. 83 Three-valued logics: OR OR TRUE FALSE UNKNOWN TRUE TRUE

    TRUE TRUE FALSE TRUE FALSE UNKNOWN UNKNOWN TRUE UNKNOWN UNKNOWN This is common sense. e.g., <anything> OR TRUE = TRUE thus UNKNOWN OR TRUE = TRUE
  84. 84 DML: Data Manipulation Language (Query, insert, remove rows) Some

    terminology Data Schema DDL: Data Definition Language (Create or table/schema, drop it)
  85. 85 Create Read Update Delete Some terminology

  86. 86 Transactions

  87. 87 The good old times of databases: ACID Atomicity Consistency

    Isolation Durability
  88. 88 Atomicity Either the entire transaction is applied, or none

    of it (rollback).
  89. 89 Consistency After a transaction, the database is in a

    consistent state again.
  90. 90 Isolation A transaction feels like nobody else is writing

    to the database.
  91. 91 Durability Updates made do not disappear again.

  92. 92 The new era: the CAP theorem Yet another impossibility

    triangle
  93. 93 The new era: the CAP theorem Consistency Availability Partition

    tolerance
  94. 94 (Atomic) Consistency All nodes see the same data.

  95. 95 Availability It is possible to query the database at

    all times.
  96. 96 Partition tolerance The database continues to function even if

    the network gets partitioned.
  97. 97 Performance

  98. 98 No such thing as "one size fits all" Data

    shapes matter
  99. 99 Optimize for read vs. write intensive OLTP OLAP Write-intensive

    Read-intensive OnLine Transaction Processing OnLine Analytical Processing
  100. 100 Indices

  101. 101 Data Scale-Up

  102. 102 Data can have... Lots of rows

  103. 103 Data can have... Lots of columns

  104. 104 Data can have... Lots of nesting

  105. 105 The rest of the lecture: Scaling up Data in

    the Large Lots of rows Object Storage Lots of rows Distributed File Systems Lots of rows/columns Column storage Lots of rows Massive Parallel Processing Data in the Small Lots of nesting Syntax Lots of nesting Data Models Lots of nesting Querying Lots of nesting Document Stores Data in the Very Small Lots of rows Graph Databases Lots of columns Data Warehousing Lots of columns Business reporting