NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

Slide 1

Slide 1 text

NoSQL Showdown Lightning Talks Edition Dave Stokes @stoker Robert McFrazier @rmcfrazier Jeremy Mikola @jmikola

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Some History Ben Stopford, Thoughts on Big Data http://www.benstopford.com/2012/06/30/thoughts-on-big-data-technologies-part-1

Slide 4

Slide 4 text

What is NoSQL? Key/Value Graph Key/Value Key/Value Document BigTable

Slide 5

Slide 5 text

Key/Value Stores ● Maps arbitrary keys to values ● No knowledge of the value's format ● Completely schema-less ● Implementations ● Eventually consistent, hierarchal, ordered, in-RAM ● Operations ● Get, set, and delete values by key

Slide 6

Slide 6 text

BigTable ● Sparse, distributed data storage ● Multi-dimensional, sorted map ● Indexed by row/column keys and timestamp ● Data processing ● MapReduce ● Bloom filters

Slide 7

Slide 7 text

https://twitter.com/rlazarus/status/640191284469178368

Slide 8

Slide 8 text

Graph Stores ● Nodes are connected by edges ● Index-free adjacency ● Annotate nodes and edges with properties ● Operations ● Create nodes and edges, assign properties ● Lookup nodes and edges by indexable properties ● Query by algorithmic graph traversals

Slide 9

Slide 9 text

Document Stores ● Documents have a unique ID and some fields ● Organized by collections, tags, metadata, etc. ● Formats such as XML, JSON, BSON ● Structure may vary by document (schema-less) ● Operations ● Query by namespace, ID, or field values ● Insert new documents or update existing fields

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

MongoDB's Philosophy ● Document data models are great ● Non-relational model allows horizontal scaling ● Maintain functionality whenever possible ● Minimize the learning curve ● Easy to setup and deploy anywhere ● JavaScript and JSON are ubiquitous ● Automate sharding and replication

Slide 12

Slide 12 text

MongoDB Under the Hood ● Server in C++, supports various architectures ● Idiomatic drivers for over a dozen languages ● Data format and wire protocol use BSON ● Multiple storage engines for different use cases ● MMAPv1, WiredTiger, RocksDB ● B-tree, geospatial, TTL, and text indexes

Slide 13

Slide 13 text

Replica Sets ● Primary, secondaries, and arbiters ● Optionally direct read queries to a secondary ● Automatic failover mongod Primary Application mongod Secondary mongod Arbiter

Slide 14

Slide 14 text

Sharding Application mongod Primary mongod Secondary mongod Secondary mongod Primary mongod Secondary mongod Secondary mongod Primary mongod Secondary mongod Secondary mongos mongos mongod Config 2 mongod Config 3 mongod Config 1

Slide 15

Slide 15 text

Sharding is the tool for scaling a system. Replication is the tool for data safety, high availability, and disaster recovery.

Slide 16

Slide 16 text

It's not just about servers.

Slide 17

Slide 17 text

Scaling Development ● Data format analogous to our domain model ● Embedded documents ● Arrays (of scalars, documents, or nested arrays) ● Schema agility for ever-changing requirements ● Useful features ● Aggregation framework ● MapReduce, integration with Hadoop ● Geospatial queries, GridFS, full-text search

Slide 18

Slide 18 text

Working with Data $ mongo MongoDB shell version: 3.0.6 connecting to: test > db.events.insert({name: "PNWPHP", tags: ["php", "Seattle", "conference"]}) WriteResult({ "nInserted" : 1 }) > db.events.findOne() { "_id" : ObjectId("55f1e7a7816b31e25abd2055"), "name" : "PNWPHP", "tags" : [ "php", "Seattle", "conference" ] } > db.events.update( ... {name: "PNWPHP"}, ... {$set: { name: "WurstCon"}, $push: { tags: "sausage" }}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Slide 19

Slide 19 text

Working with Data > db.records.aggregate([ ... { $match: { status: "A" }}, ... { $group: { _id: "$cust_id", total: { $sum: "$amount" }}} ... ])

Slide 20

Slide 20 text

MongoDB and PHP ● Legacy driver (PHP 5.3+) ● http://pecl.php.net/package/mongo ● http://php.net/manual/en/book.mongo.php ● http://github.com/mongofill/mongofill ● Next-gen driver (PHP 5.4+, PHP 7, HHVM) ● http://pecl.php.net/package/mongodb ● http://php.net/manual/en/set.mongodb.php

Slide 21

Slide 21 text

Coming Soon to a PHP Runtime Near You

Slide 22

Slide 22 text

Doctrine MongoDB ODM http://doctrine-project.org/ ● Object document-mapper for MongoDB ● Data mapper pattern completely separates domain models from persistence logic ● Model classes, query builders, relationships ● Interoperability with Doctrine ORM for SQL ● Now stable after five years of beta!

Slide 23

Slide 23 text

Questions?

Slide 24

Slide 24 text

Cassandra in 10 minutes Robert McFrazier @rmcfrazier special thanks to Bulat Shakirzyanov for the DataStax reference material

Slide 25

Slide 25 text

Cassandra in 10 minutes Why Cassandra? Linear scaling • Need more throughput • Need more storage capacity Just Add More Nodes !!

Slide 26

Slide 26 text

Cassandra in 10 minutes

Slide 27

Slide 27 text

Cassandra in 10 minutes Why Cassandra? Highly available no SPoF • No master or slave nodes • All nodes are the same… …not really (seed vs. non-seed nodes)

Slide 28

Slide 28 text

Cassandra in 10 minutes Why Cassandra? Able to adjust data consistency • Fast-read/slow-write vs. slow-read/fast-write • Eventual consistency

Slide 29

Slide 29 text

Cassandra in 10 minutes Why Cassandra? Rack and data center aware • Tune replicated data down to the racks in a data center

Slide 30

Slide 30 text

Cassandra in 10 minutes Why Cassandra? Support options • Large and active open source community • Project is backed by commercial company

Slide 31

Slide 31 text

Cassandra in 10 minutes Why Cassandra? Solr and Hadoop out of the box (DataStax Enterprise Analytics and Search)

Slide 32

Slide 32 text

Cassandra in 10 minutes Why Cassandra? Column Family • Schemaless • TTL on the column • Rows do not have to have data in all columns • Great for time series data

Slide 33

Slide 33 text

Cassandra in 10 minutes Problem Gather phone switch data from VoIP callers, and determine where the calls originated. Phone switch created Ids for each VoIP user, and those Ids were recycled every 15 minutes. User base of 20,000 and growing

Slide 34

Slide 34 text

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Insert Picture Here NoSQL and JSON with MySQL Dave Stokes MySQL Community Manager [email protected] @stoker Slideshare.net/davidmstokes Insert Picture Here

Slide 35

Slide 35 text

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 2 Safe Harbor The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 4 Relational Data ● Based on relational calculus, set theory ● Been heavily used for decades ● Many vendors ● Goal: Store data efficiently

Slide 38

Slide 38 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 5 Relational Data ● ACID (from Wikipedia https://en.wikipedia.org/wiki/ACID) ● Atomicity – equires that each transaction be "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged ● Consistency – ensures that any transaction will bring the database from one valid state to another. ● Isolation – the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially ● Durability – means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.

Slide 39

Slide 39 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 6 So why NoSQL?!? ● A NoSQL (often interpreted as Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. (https://en.wikipedia.org/wiki/NoSQL) ● Motivations for this approach include simplicity of design, presumed better "horizontal" scaling to clusters of machine, which is a problem for relational databases, and presumed finer control over availability.

Slide 40

Slide 40 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 7 CAP Theorem CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: ● Consistency (all nodes see the same data at the same time) ● Availability (a guarantee that every request receives a response about whether it succeeded or failed) ● Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures) (https://en.wikipedia.org/wiki/CAP_theorem)

Slide 41

Slide 41 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 8 Want BOTH! What to do?!?!?! ● Polyglot Databases – Use best storage approach for the data ● Oracle, Postgresql, MySQL, etc. all adopting some NoSQL features ● NoSQL trying to adopt SQL ● Vendors not dumb! ● Better for consumers

Slide 42

Slide 42 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 9 Access SQL and NoSQL One set of disks Simultaneous access MySQL InnoDB tables and NDB tables Use SQL and/or Key/Value pair 2,000,000,000 writes a minute with MySQL Cluster At the same time!

Slide 43

Slide 43 text

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 12 Benefits Raw performance for simple lookups. Direct access to the InnoDB storage engine avoids the parsing and planning overhead of SQL. Running memcached in the same process space as the MySQL server avoids the network overhead of passing requests back and forth. Data is stored in a MySQL database to protect against crashes, outages, and corruption. The transfer between memory and disk is handled automatically, simplifying application logic. Data can be unstructured or structured, depending on the type of application. You can make an all-new table for the data, or map the NoSQL-style processing to one or more existing tables.

Slide 46

Slide 46 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 13 Benefits continued You can still access the underlying table through SQL, for reporting, analysis, ad hoc queries, bulk loading, set operations such as union and intersection, and other operations well suited to the expressiveness and flexibility of SQL. You can ensure high availability of the NoSQL data by using this feature on a master server in combination with MySQL replication. The integration of memcached with MySQL provides a painless way to make the in-memory data persistent, so you can use it for more significant kinds of data. You can put more add, incr, and similar write operations into your application, without worrying that the data could disappear at any moment. You can stop and start the memcached server without losing updates made to the cached data. To guard against unexpected outages, you can take advantage of InnoDB crash recovery, replication, and backup procedures. The way InnoDB does fast primary key lookups is a natural fit for memcached single-item queries. The direct, low-level database access path used by the memcached plugin is much more efficient for key- value lookups than equivalent SQL queries.

Slide 47

Slide 47 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 14 More Benefits Continued The serialization features of memcached, which can turn complex data structures, binary files, or even code blocks into storeable strings, offer a simple way to get such objects into a database. Because you can access the underlying data through SQL, you can produce reports, search or update across multiple keys, and call functions such as AVG() and MAX() on the memcached data. All of these operations are expensive or complicated with the standalone memcached. You do not need to manually load data into memcached at startup. As particular keys are requested by an application, the values are retrieved from the database automatically, and cached in memory using the InnoDB buffer pool. Because memcached consumes relatively little CPU, and its memory footprint is easy to control, it can run comfortably alongside a MySQL instance on the same system. Because data consistency is enforced through the usual mechanism as with regular InnoDB tables, you do not have to worry about stale memcached data or fallback logic to query the database in the case of a missing key

Slide 48

Slide 48 text

Slide 49

Slide 49 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 16 Here is an example using telnet to send memcached commands and receive results through the ASCII protocol: ● telnet 127.0.0.1 11211 ● set a11 10 0 9 ● 123456789 ● STORED ● get a11 ● VALUE a11 0 9 ● 123456789 ● END ● quit Set memory location 'a11' to hold 9 characters '123456789' – 10 & 0 are TTL and flags

Slide 50

Slide 50 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 17 Can it be used with MySQL Replication? Because the InnoDB memcached daemon plugin supports the MySQL binary log, any updates made on a master server through the memcached interface can be replicated for backup, balancing intensive read workloads, and high availability. All memcached commands are supported for binlogging. You do not need to set up the InnoDB memcached plugin on the slave servers. In this configuration, the primary advantage is increased write throughput on the master. The speed of the replication mechanism is not affected

Slide 51

Slide 51 text

Slide 52

Slide 52 text

Slide 53

Slide 53 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 22 Native JSON Data Type mysql> CREATE TABLE employees (data JSON); Query OK, 0 rows affected (0,01 sec) mysql> INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}'); Query OK, 1 row affected (0,00 sec) mysql> INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}'); Query OK, 1 row affected (0,00 sec) mysql> select * from employees; +---------------------------+ | data | +---------------------------+ | {"id": 1, "name": "Jane"} | | {"id": 2, "name": "Joe"} | +---------------------------+ 2 rows in set (0,00 sec)

Slide 54

Slide 54 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 23 JSON Details ● UTF8MB4 Character Set ● Optimized for READ intensive workloads ● Parse & Validation on Insert Only ● Dictionary ● Sorted object keys ● Fast access to array cell by index keys ● Support all JSON Data types plus date, time, timestamp

Slide 55

Slide 55 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 25 JSON Functions ● Manipulating JSON documents: ● jsn_array() ● jsn_object() ● jsn_insert() ● jsn_remove() ● jsn_set() ● jsn_replace() ● jsn_append() ● jsn_merge() ● jsn_extract() ● JSON Query: ● JSN_SEARCH() ● JSN_CONTAINS() ● JSN_CONTAINS_PATH() ● JSN_VALID() ● JSN_TYPE() ● JSN_KEYS() ● JSN_LENGTH() ● JSN_DEPTH() ● JSN_UNQUOTE() ● JSN_QUOTE()

Slide 56

Slide 56 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 26 Quick Example mysql> desc colors; +--------------+----------+------+-----+---------+-------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+----------+------+-----+---------+-------------------+ | popular_name | char(10) | YES | | NULL | | | hue | json | YES | | NULL | | +--------------+----------+------+-----+---------+-------------------+ 2 rows in set (0.00 sec) INSERT INTO `colors` VALUES ('red','{\"value\": \"f00\"}'), ('green','{\"value\": \"0f0\"}'),('blue','{\"value\": \"00f\"}'),('cyan','{\"value\": \"0ff\"}'),('magenta','{\"value\": \"f0f\"}'),('yellow','{\"value\": \"ff0\"}'), ('black','{\"value\": \"000\"}');

Slide 57

Slide 57 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 27 Using jsn_extract mysql> SELECT jsn_extract(hue, '$.value') FROM colors WHERE jsn_extract(hue, '$.value')="f0f"; +-----------------------------+ | jsn_extract(hue, '$.value') | +-----------------------------+ | "f0f" | +-----------------------------+ 1 row in set (0.00 sec)

Slide 58

Slide 58 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 28 Generated Columns ● Can be ● Stored – sales_tax = sales_amount * tax_rate ● Virtual – calculated on fly ● Used to build indexes on JSON columns for speed ● Used to use a default character set collation for case insensative string searches

Slide 59

Slide 59 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 29 JSON or Not to JSON ● Up to you ● Yes ● More flexible, use with data that is hard to model ● Support custom fields ● Easier denormalization ● No painful schema changes, easier prototyping ● Mix and match

Slide 60

Slide 60 text

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 30 For More Information ● Mysql.com ● Labs.mysql.com ● Planet.mysql.com ● Dave Stokes [email protected] @stoker slideshare.net/davidmstokes