NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

NoSQL Showdown Lightning Talks Edition Dave Stokes @stoker Robert McFrazier
@rmcfrazier Jeremy Mikola @jmikola

Some History Ben Stopford, Thoughts on Big Data http://www.benstopford.com/2012/06/30/thoughts-on-big-data-technologies-part-1

What is NoSQL? Key/Value Graph Key/Value Key/Value Document BigTable

Key/Value Stores • Maps arbitrary keys to values • No
knowledge of the value's format • Completely schema-less • Implementations • Eventually consistent, hierarchal, ordered, in-RAM • Operations • Get, set, and delete values by key

BigTable • Sparse, distributed data storage • Multi-dimensional, sorted map
• Indexed by row/column keys and timestamp • Data processing • MapReduce • Bloom filters

https://twitter.com/rlazarus/status/640191284469178368

Graph Stores • Nodes are connected by edges • Index-free
adjacency • Annotate nodes and edges with properties • Operations • Create nodes and edges, assign properties • Lookup nodes and edges by indexable properties • Query by algorithmic graph traversals

Document Stores • Documents have a unique ID and some
fields • Organized by collections, tags, metadata, etc. • Formats such as XML, JSON, BSON • Structure may vary by document (schema-less) • Operations • Query by namespace, ID, or field values • Insert new documents or update existing fields

MongoDB's Philosophy • Document data models are great • Non-relational
model allows horizontal scaling • Maintain functionality whenever possible • Minimize the learning curve • Easy to setup and deploy anywhere • JavaScript and JSON are ubiquitous • Automate sharding and replication

MongoDB Under the Hood • Server in C++, supports various
architectures • Idiomatic drivers for over a dozen languages • Data format and wire protocol use BSON • Multiple storage engines for different use cases • MMAPv1, WiredTiger, RocksDB • B-tree, geospatial, TTL, and text indexes

Replica Sets • Primary, secondaries, and arbiters • Optionally direct
read queries to a secondary • Automatic failover mongod Primary Application mongod Secondary mongod Arbiter

Sharding Application mongod Primary mongod Secondary mongod Secondary mongod Primary
mongod Secondary mongod Secondary mongod Primary mongod Secondary mongod Secondary mongos mongos mongod Config 2 mongod Config 3 mongod Config 1

Sharding is the tool for scaling a system. Replication is
the tool for data safety, high availability, and disaster recovery.

It's not just about servers.

Scaling Development • Data format analogous to our domain model
• Embedded documents • Arrays (of scalars, documents, or nested arrays) • Schema agility for ever-changing requirements • Useful features • Aggregation framework • MapReduce, integration with Hadoop • Geospatial queries, GridFS, full-text search

Working with Data $ mongo MongoDB shell version: 3.0.6 connecting
to: test > db.events.insert({name: "PNWPHP", tags: ["php", "Seattle", "conference"]}) WriteResult({ "nInserted" : 1 }) > db.events.findOne() { "_id" : ObjectId("55f1e7a7816b31e25abd2055"), "name" : "PNWPHP", "tags" : [ "php", "Seattle", "conference" ] } > db.events.update( ... {name: "PNWPHP"}, ... {$set: { name: "WurstCon"}, $push: { tags: "sausage" }}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Working with Data > db.records.aggregate([ ... { $match: { status:
"A" }}, ... { $group: { _id: "$cust_id", total: { $sum: "$amount" }}} ... ])

MongoDB and PHP • Legacy driver (PHP 5.3+) • http://pecl.php.net/package/mongo
• http://php.net/manual/en/book.mongo.php • http://github.com/mongofill/mongofill • Next-gen driver (PHP 5.4+, PHP 7, HHVM) • http://pecl.php.net/package/mongodb • http://php.net/manual/en/set.mongodb.php

Coming Soon to a PHP Runtime Near You

Doctrine MongoDB ODM http://doctrine-project.org/ • Object document-mapper for MongoDB •
Data mapper pattern completely separates domain models from persistence logic • Model classes, query builders, relationships • Interoperability with Doctrine ORM for SQL • Now stable after five years of beta!

Questions?

Cassandra in 10 minutes Robert McFrazier @rmcfrazier special thanks to
Bulat Shakirzyanov for the DataStax reference material

Cassandra in 10 minutes Why Cassandra? Linear scaling • Need
more throughput • Need more storage capacity Just Add More Nodes !!

Cassandra in 10 minutes

Cassandra in 10 minutes Why Cassandra? Highly available no SPoF
• No master or slave nodes • All nodes are the same… …not really (seed vs. non-seed nodes)

Cassandra in 10 minutes Why Cassandra? Able to adjust data
consistency • Fast-read/slow-write vs. slow-read/fast-write • Eventual consistency

Cassandra in 10 minutes Why Cassandra? Rack and data center
aware • Tune replicated data down to the racks in a data center

Cassandra in 10 minutes Why Cassandra? Support options • Large
and active open source community • Project is backed by commercial company

Cassandra in 10 minutes Why Cassandra? Solr and Hadoop out
of the box (DataStax Enterprise Analytics and Search)

Cassandra in 10 minutes Why Cassandra? Column Family • Schemaless
• TTL on the column • Rows do not have to have data in all columns • Great for time series data

Cassandra in 10 minutes Problem Gather phone switch data from
VoIP callers, and determine where the calls originated. Phone switch created Ids for each VoIP user, and those Ids were recycled every 15 minutes. User base of 20,000 and growing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
1 Insert Picture Here NoSQL and JSON with MySQL Dave Stokes MySQL Community Manager [email protected] @stoker Slideshare.net/davidmstokes Insert Picture Here

2 Safe Harbor The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3 MySQL  Most popular database on the web  Ubiquitous  16+ million instances  Feeds 80% of Hadoop installs  20 Years Old

4 Relational Data • Based on relational calculus, set theory • Been heavily used for decades • Many vendors • Goal: Store data efficiently

5 Relational Data • ACID (from Wikipedia https://en.wikipedia.org/wiki/ACID) • Atomicity – equires that each transaction be "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged • Consistency – ensures that any transaction will bring the database from one valid state to another. • Isolation – the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially • Durability – means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.

6 So why NoSQL?!? • A NoSQL (often interpreted as Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. (https://en.wikipedia.org/wiki/NoSQL) • Motivations for this approach include simplicity of design, presumed better "horizontal" scaling to clusters of machine, which is a problem for relational databases, and presumed finer control over availability.

7 CAP Theorem CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it succeeded or failed) • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures) (https://en.wikipedia.org/wiki/CAP_theorem)

8 Want BOTH! What to do?!?!?! • Polyglot Databases – Use best storage approach for the data • Oracle, Postgresql, MySQL, etc. all adopting some NoSQL features • NoSQL trying to adopt SQL • Vendors not dumb! • Better for consumers

9 Access SQL and NoSQL One set of disks Simultaneous access MySQL InnoDB tables and NDB tables Use SQL and/or Key/Value pair 2,000,000,000 writes a minute with MySQL Cluster At the same time!

10

11

12 Benefits Raw performance for simple lookups. Direct access to the InnoDB storage engine avoids the parsing and planning overhead of SQL. Running memcached in the same process space as the MySQL server avoids the network overhead of passing requests back and forth. Data is stored in a MySQL database to protect against crashes, outages, and corruption. The transfer between memory and disk is handled automatically, simplifying application logic. Data can be unstructured or structured, depending on the type of application. You can make an all-new table for the data, or map the NoSQL-style processing to one or more existing tables.

13 Benefits continued You can still access the underlying table through SQL, for reporting, analysis, ad hoc queries, bulk loading, set operations such as union and intersection, and other operations well suited to the expressiveness and flexibility of SQL. You can ensure high availability of the NoSQL data by using this feature on a master server in combination with MySQL replication. The integration of memcached with MySQL provides a painless way to make the in-memory data persistent, so you can use it for more significant kinds of data. You can put more add, incr, and similar write operations into your application, without worrying that the data could disappear at any moment. You can stop and start the memcached server without losing updates made to the cached data. To guard against unexpected outages, you can take advantage of InnoDB crash recovery, replication, and backup procedures. The way InnoDB does fast primary key lookups is a natural fit for memcached single-item queries. The direct, low-level database access path used by the memcached plugin is much more efficient for key- value lookups than equivalent SQL queries.

14 More Benefits Continued The serialization features of memcached, which can turn complex data structures, binary files, or even code blocks into storeable strings, offer a simple way to get such objects into a database. Because you can access the underlying data through SQL, you can produce reports, search or update across multiple keys, and call functions such as AVG() and MAX() on the memcached data. All of these operations are expensive or complicated with the standalone memcached. You do not need to manually load data into memcached at startup. As particular keys are requested by an application, the values are retrieved from the database automatically, and cached in memory using the InnoDB buffer pool. Because memcached consumes relatively little CPU, and its memory footprint is easy to control, it can run comfortably alongside a MySQL instance on the same system. Because data consistency is enforced through the usual mechanism as with regular InnoDB tables, you do not have to worry about stale memcached data or fallback logic to query the database in the case of a missing key

15 Installation mysql> install plugin daemon_memcached soname "libmemcached.so";

16 Here is an example using telnet to send memcached commands and receive results through the ASCII protocol: • telnet 127.0.0.1 11211 • set a11 10 0 9 • 123456789 • STORED • get a11 • VALUE a11 0 9 • 123456789 • END • quit Set memory location 'a11' to hold 9 characters '123456789' – 10 & 0 are TTL and flags

17 Can it be used with MySQL Replication? Because the InnoDB memcached daemon plugin supports the MySQL binary log, any updates made on a master server through the memcached interface can be replicated for backup, balancing intensive read workloads, and high availability. All memcached commands are supported for binlogging. You do not need to set up the InnoDB memcached plugin on the slave servers. In this configuration, the primary advantage is increased write throughput on the master. The speed of the replication mechanism is not affected

19 Bypass MySQL daemon! • 4.2 billions reads/min • 1.2 billion updates/min • 2 billion writes a min • MySQL Cluster

20 Can you have the best of both worlds????

22 Native JSON Data Type mysql> CREATE TABLE employees (data JSON); Query OK, 0 rows affected (0,01 sec) mysql> INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}'); Query OK, 1 row affected (0,00 sec) mysql> INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}'); Query OK, 1 row affected (0,00 sec) mysql> select * from employees; +---------------------------+ | data | +---------------------------+ | {"id": 1, "name": "Jane"} | | {"id": 2, "name": "Joe"} | +---------------------------+ 2 rows in set (0,00 sec)

23 JSON Details • UTF8MB4 Character Set • Optimized for READ intensive workloads • Parse & Validation on Insert Only • Dictionary • Sorted object keys • Fast access to array cell by index keys • Support all JSON Data types plus date, time, timestamp

25 JSON Functions • Manipulating JSON documents: • jsn_array() • jsn_object() • jsn_insert() • jsn_remove() • jsn_set() • jsn_replace() • jsn_append() • jsn_merge() • jsn_extract() • JSON Query: • JSN_SEARCH() • JSN_CONTAINS() • JSN_CONTAINS_PATH() • JSN_VALID() • JSN_TYPE() • JSN_KEYS() • JSN_LENGTH() • JSN_DEPTH() • JSN_UNQUOTE() • JSN_QUOTE()

26 Quick Example mysql> desc colors; +--------------+----------+------+-----+---------+-------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+----------+------+-----+---------+-------------------+ | popular_name | char(10) | YES | | NULL | | | hue | json | YES | | NULL | | +--------------+----------+------+-----+---------+-------------------+ 2 rows in set (0.00 sec) INSERT INTO `colors` VALUES ('red','{\"value\": \"f00\"}'), ('green','{\"value\": \"0f0\"}'),('blue','{\"value\": \"00f\"}'),('cyan','{\"value\": \"0ff\"}'),('magenta','{\"value\": \"f0f\"}'),('yellow','{\"value\": \"ff0\"}'), ('black','{\"value\": \"000\"}');

27 Using jsn_extract mysql> SELECT jsn_extract(hue, '$.value') FROM colors WHERE jsn_extract(hue, '$.value')="f0f"; +-----------------------------+ | jsn_extract(hue, '$.value') | +-----------------------------+ | "f0f" | +-----------------------------+ 1 row in set (0.00 sec)

28 Generated Columns • Can be • Stored – sales_tax = sales_amount * tax_rate • Virtual – calculated on fly • Used to build indexes on JSON columns for speed • Used to use a default character set collation for case insensative string searches

29 JSON or Not to JSON • Up to you • Yes • More flexible, use with data that is hard to model • Support custom fields • Easier denormalization • No painful schema changes, easier prototyping • Mix and match

30 For More Information • Mysql.com • Labs.mysql.com • Planet.mysql.com • Dave Stokes [email protected] @stoker slideshare.net/davidmstokes

NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

More Decks by Jeremy Mikola

Other Decks in Programming

Featured

Transcript