Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

Jeremy Mikola
September 11, 2015

NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

Presented September 11, 2015 at Pacific Northwest PHP: https://joind.in/talk/view/14921

Jeremy Mikola

September 11, 2015
Tweet

More Decks by Jeremy Mikola

Other Decks in Programming

Transcript

  1. Key/Value Stores • Maps arbitrary keys to values • No

    knowledge of the value's format • Completely schema-less • Implementations • Eventually consistent, hierarchal, ordered, in-RAM • Operations • Get, set, and delete values by key
  2. BigTable • Sparse, distributed data storage • Multi-dimensional, sorted map

    • Indexed by row/column keys and timestamp • Data processing • MapReduce • Bloom filters
  3. Graph Stores • Nodes are connected by edges • Index-free

    adjacency • Annotate nodes and edges with properties • Operations • Create nodes and edges, assign properties • Lookup nodes and edges by indexable properties • Query by algorithmic graph traversals
  4. Document Stores • Documents have a unique ID and some

    fields • Organized by collections, tags, metadata, etc. • Formats such as XML, JSON, BSON • Structure may vary by document (schema-less) • Operations • Query by namespace, ID, or field values • Insert new documents or update existing fields
  5. MongoDB's Philosophy • Document data models are great • Non-relational

    model allows horizontal scaling • Maintain functionality whenever possible • Minimize the learning curve • Easy to setup and deploy anywhere • JavaScript and JSON are ubiquitous • Automate sharding and replication
  6. MongoDB Under the Hood • Server in C++, supports various

    architectures • Idiomatic drivers for over a dozen languages • Data format and wire protocol use BSON • Multiple storage engines for different use cases • MMAPv1, WiredTiger, RocksDB • B-tree, geospatial, TTL, and text indexes
  7. Replica Sets • Primary, secondaries, and arbiters • Optionally direct

    read queries to a secondary • Automatic failover mongod Primary Application mongod Secondary mongod Arbiter
  8. Sharding Application mongod Primary mongod Secondary mongod Secondary mongod Primary

    mongod Secondary mongod Secondary mongod Primary mongod Secondary mongod Secondary mongos mongos mongod Config 2 mongod Config 3 mongod Config 1
  9. Sharding is the tool for scaling a system. Replication is

    the tool for data safety, high availability, and disaster recovery.
  10. Scaling Development • Data format analogous to our domain model

    • Embedded documents • Arrays (of scalars, documents, or nested arrays) • Schema agility for ever-changing requirements • Useful features • Aggregation framework • MapReduce, integration with Hadoop • Geospatial queries, GridFS, full-text search
  11. Working with Data $ mongo MongoDB shell version: 3.0.6 connecting

    to: test > db.events.insert({name: "PNWPHP", tags: ["php", "Seattle", "conference"]}) WriteResult({ "nInserted" : 1 }) > db.events.findOne() { "_id" : ObjectId("55f1e7a7816b31e25abd2055"), "name" : "PNWPHP", "tags" : [ "php", "Seattle", "conference" ] } > db.events.update( ... {name: "PNWPHP"}, ... {$set: { name: "WurstCon"}, $push: { tags: "sausage" }}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
  12. Working with Data > db.records.aggregate([ ... { $match: { status:

    "A" }}, ... { $group: { _id: "$cust_id", total: { $sum: "$amount" }}} ... ])
  13. MongoDB and PHP • Legacy driver (PHP 5.3+) • http://pecl.php.net/package/mongo

    • http://php.net/manual/en/book.mongo.php • http://github.com/mongofill/mongofill • Next-gen driver (PHP 5.4+, PHP 7, HHVM) • http://pecl.php.net/package/mongodb • http://php.net/manual/en/set.mongodb.php
  14. Doctrine MongoDB ODM http://doctrine-project.org/ • Object document-mapper for MongoDB •

    Data mapper pattern completely separates domain models from persistence logic • Model classes, query builders, relationships • Interoperability with Doctrine ORM for SQL • Now stable after five years of beta!
  15. Cassandra in 10 minutes Robert McFrazier @rmcfrazier special thanks to

    Bulat Shakirzyanov for the DataStax reference material
  16. Cassandra in 10 minutes Why Cassandra? Linear scaling • Need

    more throughput • Need more storage capacity Just Add More Nodes !!
  17. Cassandra in 10 minutes Why Cassandra? Highly available no SPoF

    • No master or slave nodes • All nodes are the same… …not really (seed vs. non-seed nodes)
  18. Cassandra in 10 minutes Why Cassandra? Able to adjust data

    consistency • Fast-read/slow-write vs. slow-read/fast-write • Eventual consistency
  19. Cassandra in 10 minutes Why Cassandra? Rack and data center

    aware • Tune replicated data down to the racks in a data center
  20. Cassandra in 10 minutes Why Cassandra? Support options • Large

    and active open source community • Project is backed by commercial company
  21. Cassandra in 10 minutes Why Cassandra? Solr and Hadoop out

    of the box (DataStax Enterprise Analytics and Search)
  22. Cassandra in 10 minutes Why Cassandra? Column Family • Schemaless

    • TTL on the column • Rows do not have to have data in all columns • Great for time series data
  23. Cassandra in 10 minutes Problem Gather phone switch data from

    VoIP callers, and determine where the calls originated. Phone switch created Ids for each VoIP user, and those Ids were recycled every 15 minutes. User base of 20,000 and growing
  24. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

    1 Insert Picture Here NoSQL and JSON with MySQL Dave Stokes MySQL Community Manager [email protected] @stoker Slideshare.net/davidmstokes Insert Picture Here
  25. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

    2 Safe Harbor The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  26. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    3 MySQL  Most popular database on the web  Ubiquitous  16+ million instances  Feeds 80% of Hadoop installs  20 Years Old
  27. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    4 Relational Data • Based on relational calculus, set theory • Been heavily used for decades • Many vendors • Goal: Store data efficiently
  28. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    5 Relational Data • ACID (from Wikipedia https://en.wikipedia.org/wiki/ACID) • Atomicity – equires that each transaction be "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged • Consistency – ensures that any transaction will bring the database from one valid state to another. • Isolation – the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially • Durability – means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
  29. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    6 So why NoSQL?!? • A NoSQL (often interpreted as Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. (https://en.wikipedia.org/wiki/NoSQL) • Motivations for this approach include simplicity of design, presumed better "horizontal" scaling to clusters of machine, which is a problem for relational databases, and presumed finer control over availability.
  30. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    7 CAP Theorem CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it succeeded or failed) • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures) (https://en.wikipedia.org/wiki/CAP_theorem)
  31. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    8 Want BOTH! What to do?!?!?! • Polyglot Databases – Use best storage approach for the data • Oracle, Postgresql, MySQL, etc. all adopting some NoSQL features • NoSQL trying to adopt SQL • Vendors not dumb! • Better for consumers
  32. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    9 Access SQL and NoSQL One set of disks Simultaneous access MySQL InnoDB tables and NDB tables Use SQL and/or Key/Value pair 2,000,000,000 writes a minute with MySQL Cluster At the same time!
  33. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    12 Benefits Raw performance for simple lookups. Direct access to the InnoDB storage engine avoids the parsing and planning overhead of SQL. Running memcached in the same process space as the MySQL server avoids the network overhead of passing requests back and forth. Data is stored in a MySQL database to protect against crashes, outages, and corruption. The transfer between memory and disk is handled automatically, simplifying application logic. Data can be unstructured or structured, depending on the type of application. You can make an all-new table for the data, or map the NoSQL-style processing to one or more existing tables.
  34. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    13 Benefits continued You can still access the underlying table through SQL, for reporting, analysis, ad hoc queries, bulk loading, set operations such as union and intersection, and other operations well suited to the expressiveness and flexibility of SQL. You can ensure high availability of the NoSQL data by using this feature on a master server in combination with MySQL replication. The integration of memcached with MySQL provides a painless way to make the in-memory data persistent, so you can use it for more significant kinds of data. You can put more add, incr, and similar write operations into your application, without worrying that the data could disappear at any moment. You can stop and start the memcached server without losing updates made to the cached data. To guard against unexpected outages, you can take advantage of InnoDB crash recovery, replication, and backup procedures. The way InnoDB does fast primary key lookups is a natural fit for memcached single-item queries. The direct, low-level database access path used by the memcached plugin is much more efficient for key- value lookups than equivalent SQL queries.
  35. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    14 More Benefits Continued The serialization features of memcached, which can turn complex data structures, binary files, or even code blocks into storeable strings, offer a simple way to get such objects into a database. Because you can access the underlying data through SQL, you can produce reports, search or update across multiple keys, and call functions such as AVG() and MAX() on the memcached data. All of these operations are expensive or complicated with the standalone memcached. You do not need to manually load data into memcached at startup. As particular keys are requested by an application, the values are retrieved from the database automatically, and cached in memory using the InnoDB buffer pool. Because memcached consumes relatively little CPU, and its memory footprint is easy to control, it can run comfortably alongside a MySQL instance on the same system. Because data consistency is enforced through the usual mechanism as with regular InnoDB tables, you do not have to worry about stale memcached data or fallback logic to query the database in the case of a missing key
  36. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    15 Installation mysql> install plugin daemon_memcached soname "libmemcached.so";
  37. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    16 Here is an example using telnet to send memcached commands and receive results through the ASCII protocol: • telnet 127.0.0.1 11211 • set a11 10 0 9 • 123456789 • STORED • get a11 • VALUE a11 0 9 • 123456789 • END • quit Set memory location 'a11' to hold 9 characters '123456789' – 10 & 0 are TTL and flags
  38. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    17 Can it be used with MySQL Replication? Because the InnoDB memcached daemon plugin supports the MySQL binary log, any updates made on a master server through the memcached interface can be replicated for backup, balancing intensive read workloads, and high availability. All memcached commands are supported for binlogging. You do not need to set up the InnoDB memcached plugin on the slave servers. In this configuration, the primary advantage is increased write throughput on the master. The speed of the replication mechanism is not affected
  39. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    19 Bypass MySQL daemon! • 4.2 billions reads/min • 1.2 billion updates/min • 2 billion writes a min • MySQL Cluster
  40. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    20 Can you have the best of both worlds????
  41. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    22 Native JSON Data Type mysql> CREATE TABLE employees (data JSON); Query OK, 0 rows affected (0,01 sec) mysql> INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}'); Query OK, 1 row affected (0,00 sec) mysql> INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}'); Query OK, 1 row affected (0,00 sec) mysql> select * from employees; +---------------------------+ | data | +---------------------------+ | {"id": 1, "name": "Jane"} | | {"id": 2, "name": "Joe"} | +---------------------------+ 2 rows in set (0,00 sec)
  42. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    23 JSON Details • UTF8MB4 Character Set • Optimized for READ intensive workloads • Parse & Validation on Insert Only • Dictionary • Sorted object keys • Fast access to array cell by index keys • Support all JSON Data types plus date, time, timestamp
  43. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    25 JSON Functions • Manipulating JSON documents: • jsn_array() • jsn_object() • jsn_insert() • jsn_remove() • jsn_set() • jsn_replace() • jsn_append() • jsn_merge() • jsn_extract() • JSON Query: • JSN_SEARCH() • JSN_CONTAINS() • JSN_CONTAINS_PATH() • JSN_VALID() • JSN_TYPE() • JSN_KEYS() • JSN_LENGTH() • JSN_DEPTH() • JSN_UNQUOTE() • JSN_QUOTE()
  44. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    26 Quick Example mysql> desc colors; +--------------+----------+------+-----+---------+-------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+----------+------+-----+---------+-------------------+ | popular_name | char(10) | YES | | NULL | | | hue | json | YES | | NULL | | +--------------+----------+------+-----+---------+-------------------+ 2 rows in set (0.00 sec) INSERT INTO `colors` VALUES ('red','{\"value\": \"f00\"}'), ('green','{\"value\": \"0f0\"}'),('blue','{\"value\": \"00f\"}'),('cyan','{\"value\": \"0ff\"}'),('magenta','{\"value\": \"f0f\"}'),('yellow','{\"value\": \"ff0\"}'), ('black','{\"value\": \"000\"}');
  45. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    27 Using jsn_extract mysql> SELECT jsn_extract(hue, '$.value') FROM colors WHERE jsn_extract(hue, '$.value')="f0f"; +-----------------------------+ | jsn_extract(hue, '$.value') | +-----------------------------+ | "f0f" | +-----------------------------+ 1 row in set (0.00 sec)
  46. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    28 Generated Columns • Can be • Stored – sales_tax = sales_amount * tax_rate • Virtual – calculated on fly • Used to build indexes on JSON columns for speed • Used to use a default character set collation for case insensative string searches
  47. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    29 JSON or Not to JSON • Up to you • Yes • More flexible, use with data that is hard to model • Support custom fields • Easier denormalization • No painful schema changes, easier prototyping • Mix and match
  48. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

    30 For More Information • Mysql.com • Labs.mysql.com • Planet.mysql.com • Dave Stokes [email protected] @stoker slideshare.net/davidmstokes