Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

Jeremy Mikola
September 11, 2015

NoSQL Lightning Talks (MongoDB, Cassandra, MySQL)

Presented September 11, 2015 at Pacific Northwest PHP: https://joind.in/talk/view/14921

Jeremy Mikola

September 11, 2015
Tweet

More Decks by Jeremy Mikola

Other Decks in Programming

Transcript

  1. NoSQL Showdown
    Lightning Talks Edition
    Dave Stokes
    @stoker
    Robert McFrazier
    @rmcfrazier
    Jeremy Mikola
    @jmikola

    View Slide

  2. View Slide

  3. Some History
    Ben Stopford, Thoughts on Big Data
    http://www.benstopford.com/2012/06/30/thoughts-on-big-data-technologies-part-1

    View Slide

  4. What is NoSQL?
    Key/Value Graph
    Key/Value
    Key/Value Document
    BigTable

    View Slide

  5. Key/Value Stores

    Maps arbitrary keys to values

    No knowledge of the value's format

    Completely schema-less

    Implementations

    Eventually consistent, hierarchal, ordered, in-RAM

    Operations

    Get, set, and delete values by key

    View Slide

  6. BigTable

    Sparse, distributed data storage

    Multi-dimensional, sorted map

    Indexed by row/column keys and timestamp

    Data processing

    MapReduce

    Bloom filters

    View Slide

  7. https://twitter.com/rlazarus/status/640191284469178368

    View Slide

  8. Graph Stores

    Nodes are connected by edges

    Index-free adjacency

    Annotate nodes and edges with properties

    Operations

    Create nodes and edges, assign properties

    Lookup nodes and edges by indexable properties

    Query by algorithmic graph traversals

    View Slide

  9. Document Stores

    Documents have a unique ID and some fields

    Organized by collections, tags, metadata, etc.

    Formats such as XML, JSON, BSON

    Structure may vary by document (schema-less)

    Operations

    Query by namespace, ID, or field values

    Insert new documents or update existing fields

    View Slide

  10. View Slide

  11. MongoDB's Philosophy

    Document data models are great

    Non-relational model allows horizontal scaling

    Maintain functionality whenever possible

    Minimize the learning curve

    Easy to setup and deploy anywhere

    JavaScript and JSON are ubiquitous

    Automate sharding and replication

    View Slide

  12. MongoDB Under the Hood

    Server in C++, supports various architectures

    Idiomatic drivers for over a dozen languages

    Data format and wire protocol use BSON

    Multiple storage engines for different use cases

    MMAPv1, WiredTiger, RocksDB

    B-tree, geospatial, TTL, and text indexes

    View Slide

  13. Replica Sets

    Primary, secondaries, and arbiters

    Optionally direct read queries to a secondary

    Automatic failover
    mongod
    Primary
    Application mongod
    Secondary
    mongod
    Arbiter

    View Slide

  14. Sharding
    Application
    mongod
    Primary
    mongod
    Secondary
    mongod
    Secondary
    mongod
    Primary
    mongod
    Secondary
    mongod
    Secondary
    mongod
    Primary
    mongod
    Secondary
    mongod
    Secondary
    mongos mongos
    mongod
    Config
    2
    mongod
    Config
    3
    mongod
    Config
    1

    View Slide

  15. Sharding is the tool for scaling a system.
    Replication is the tool for data safety, high
    availability, and disaster recovery.

    View Slide

  16. It's not just about servers.

    View Slide

  17. Scaling Development

    Data format analogous to our domain model

    Embedded documents

    Arrays (of scalars, documents, or nested arrays)

    Schema agility for ever-changing requirements

    Useful features

    Aggregation framework

    MapReduce, integration with Hadoop

    Geospatial queries, GridFS, full-text search

    View Slide

  18. Working with Data
    $ mongo
    MongoDB shell version: 3.0.6
    connecting to: test
    > db.events.insert({name: "PNWPHP", tags: ["php", "Seattle", "conference"]})
    WriteResult({ "nInserted" : 1 })
    > db.events.findOne()
    {
    "_id" : ObjectId("55f1e7a7816b31e25abd2055"),
    "name" : "PNWPHP",
    "tags" : [ "php", "Seattle", "conference" ]
    }
    > db.events.update(
    ... {name: "PNWPHP"},
    ... {$set: { name: "WurstCon"}, $push: { tags: "sausage" }})
    WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

    View Slide

  19. Working with Data
    > db.records.aggregate([
    ... { $match: { status: "A" }},
    ... { $group: { _id: "$cust_id", total: { $sum: "$amount" }}}
    ... ])

    View Slide

  20. MongoDB and PHP

    Legacy driver (PHP 5.3+)

    http://pecl.php.net/package/mongo

    http://php.net/manual/en/book.mongo.php

    http://github.com/mongofill/mongofill

    Next-gen driver (PHP 5.4+, PHP 7, HHVM)

    http://pecl.php.net/package/mongodb

    http://php.net/manual/en/set.mongodb.php

    View Slide

  21. Coming Soon to a PHP Runtime Near You

    View Slide

  22. Doctrine MongoDB ODM
    http://doctrine-project.org/

    Object document-mapper for MongoDB

    Data mapper pattern completely separates
    domain models from persistence logic

    Model classes, query builders, relationships

    Interoperability with Doctrine ORM for SQL

    Now stable after five years of beta!

    View Slide

  23. Questions?

    View Slide

  24. Cassandra in 10 minutes
    Robert McFrazier
    @rmcfrazier
    special thanks to Bulat Shakirzyanov for the DataStax reference material

    View Slide

  25. Cassandra in 10 minutes
    Why Cassandra?
    Linear scaling
    • Need more throughput
    • Need more storage capacity
    Just Add More Nodes !!

    View Slide

  26. Cassandra in 10 minutes

    View Slide

  27. Cassandra in 10 minutes
    Why Cassandra?
    Highly available no SPoF
    • No master or slave nodes
    • All nodes are the same…
    …not really (seed vs. non-seed nodes)

    View Slide

  28. Cassandra in 10 minutes
    Why Cassandra?
    Able to adjust data consistency
    • Fast-read/slow-write vs. slow-read/fast-write
    • Eventual consistency

    View Slide

  29. Cassandra in 10 minutes
    Why Cassandra?
    Rack and data center aware
    • Tune replicated data down to the racks in a data
    center

    View Slide

  30. Cassandra in 10 minutes
    Why Cassandra?
    Support options
    • Large and active open source community
    • Project is backed by commercial company

    View Slide

  31. Cassandra in 10 minutes
    Why Cassandra?
    Solr and Hadoop out of the box
    (DataStax Enterprise Analytics and Search)

    View Slide

  32. Cassandra in 10 minutes
    Why Cassandra?
    Column Family
    • Schemaless
    • TTL on the column
    • Rows do not have to have data in all columns
    • Great for time series data

    View Slide

  33. Cassandra in 10 minutes
    Problem
    Gather phone switch data from VoIP callers, and
    determine where the calls originated.
    Phone switch created Ids for each VoIP user, and those
    Ids were recycled every 15 minutes.
    User base of 20,000 and growing

    View Slide

  34. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
    1
    Insert Picture Here
    NoSQL and JSON
    with MySQL
    Dave Stokes
    MySQL Community Manager
    [email protected]
    @stoker
    Slideshare.net/davidmstokes
    Insert Picture Here

    View Slide

  35. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
    2
    Safe Harbor
    The following is intended to outline our general product direction. It
    is intended for information purposes only, and may not be
    incorporated into any contract. It is not a commitment to deliver any
    material, code, or functionality, and should not be relied upon in
    making purchasing decision. The development, release, and timing
    of any features or functionality described for Oracle’s products
    remains at the sole discretion of Oracle.

    View Slide

  36. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    3
    MySQL
     Most popular database on the web
     Ubiquitous
     16+ million instances
     Feeds 80% of Hadoop installs
     20 Years Old

    View Slide

  37. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    4
    Relational Data

    Based on relational calculus, set theory

    Been heavily used for decades

    Many vendors

    Goal: Store data efficiently

    View Slide

  38. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    5
    Relational Data

    ACID (from Wikipedia https://en.wikipedia.org/wiki/ACID)

    Atomicity – equires that each transaction be "all or nothing": if one part of the
    transaction fails, the entire transaction fails, and the database state is left unchanged

    Consistency – ensures that any transaction will bring the database from one valid
    state to another.

    Isolation – the concurrent execution of transactions results in a system state that
    would be obtained if transactions were executed serially

    Durability – means that once a transaction has been committed, it will remain so,
    even in the event of power loss, crashes, or errors.

    View Slide

  39. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    6
    So why NoSQL?!?

    A NoSQL (often interpreted as Not only SQL) database provides
    a mechanism for storage and retrieval of data that is modeled in
    means other than the tabular relations used in relational
    databases. (https://en.wikipedia.org/wiki/NoSQL)

    Motivations for this approach include simplicity of design,
    presumed better "horizontal" scaling to clusters of machine,
    which is a problem for relational databases, and presumed finer
    control over availability.

    View Slide

  40. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    7
    CAP Theorem
    CAP theorem states that it is impossible for a distributed computer
    system to simultaneously provide all three of the following guarantees:

    Consistency (all nodes see the same data at the same time)

    Availability (a guarantee that every request receives a response about
    whether it succeeded or failed)

    Partition tolerance (the system continues to operate despite arbitrary
    partitioning due to network failures)
    (https://en.wikipedia.org/wiki/CAP_theorem)

    View Slide

  41. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    8
    Want BOTH! What to do?!?!?!

    Polyglot Databases – Use best storage approach for the data

    Oracle, Postgresql, MySQL, etc. all adopting some NoSQL features

    NoSQL trying to adopt SQL

    Vendors not dumb!

    Better for consumers

    View Slide

  42. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    9
    Access SQL
    and NoSQL
    One set of disks
    Simultaneous access
    MySQL InnoDB tables and NDB tables
    Use SQL and/or Key/Value pair
    2,000,000,000 writes a minute with MySQL Cluster
    At the same time!

    View Slide

  43. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    10

    View Slide

  44. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    11

    View Slide

  45. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    12
    Benefits
    Raw performance for simple lookups. Direct access to the InnoDB storage engine avoids the
    parsing and planning overhead of SQL. Running memcached in the same process space as
    the MySQL server avoids the network overhead of passing requests back and forth.
    Data is stored in a MySQL database to protect against crashes, outages, and corruption.
    The transfer between memory and disk is handled automatically, simplifying application logic.
    Data can be unstructured or structured, depending on the type of application. You can make
    an all-new table for the data, or map the NoSQL-style processing to one or more existing
    tables.

    View Slide

  46. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    13
    Benefits continued
    You can still access the underlying table through SQL, for reporting, analysis, ad hoc queries, bulk
    loading, set operations such as union and intersection, and other operations well suited to the
    expressiveness and flexibility of SQL.
    You can ensure high availability of the NoSQL data by using this feature on a master server in
    combination with MySQL replication.
    The integration of memcached with MySQL provides a painless way to make the in-memory data
    persistent, so you can use it for more significant kinds of data. You can put more add, incr, and similar
    write operations into your application, without worrying that the data could disappear at any moment.
    You can stop and start the memcached server without losing updates made to the cached data. To
    guard against unexpected outages, you can take advantage of InnoDB crash recovery, replication, and
    backup procedures.
    The way InnoDB does fast primary key lookups is a natural fit for memcached single-item queries. The
    direct, low-level database access path used by the memcached plugin is much more efficient for key-
    value lookups than equivalent SQL queries.

    View Slide

  47. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    14
    More Benefits Continued
    The serialization features of memcached, which can turn complex data structures, binary files, or even
    code blocks into storeable strings, offer a simple way to get such objects into a database.
    Because you can access the underlying data through SQL, you can produce reports, search or update
    across multiple keys, and call functions such as AVG() and MAX() on the memcached data. All of
    these operations are expensive or complicated with the standalone memcached.
    You do not need to manually load data into memcached at startup. As particular keys are requested
    by an application, the values are retrieved from the database automatically, and cached in memory
    using the InnoDB buffer pool.
    Because memcached consumes relatively little CPU, and its memory footprint is easy to control, it can
    run comfortably alongside a MySQL instance on the same system.
    Because data consistency is enforced through the usual mechanism as with regular InnoDB tables,
    you do not have to worry about stale memcached data or fallback logic to query the database in the
    case of a missing key

    View Slide

  48. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    15
    Installation
    mysql> install plugin daemon_memcached soname "libmemcached.so";

    View Slide

  49. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    16
    Here is an example using telnet to send memcached commands and receive
    results through the ASCII protocol:

    telnet 127.0.0.1 11211

    set a11 10 0 9

    123456789

    STORED

    get a11

    VALUE a11 0 9

    123456789

    END

    quit
    Set memory location 'a11'
    to hold 9 characters
    '123456789' – 10 & 0 are
    TTL and flags

    View Slide

  50. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    17
    Can it be used with MySQL Replication?
    Because the InnoDB memcached daemon plugin supports the
    MySQL binary log, any updates made on a master server through the
    memcached interface can be replicated for backup, balancing
    intensive read workloads, and high availability. All memcached
    commands are supported for binlogging.
    You do not need to set up the InnoDB memcached plugin on the
    slave servers. In this configuration, the primary advantage is
    increased write throughput on the master. The speed of the
    replication mechanism is not affected

    View Slide

  51. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    19
    Bypass MySQL daemon!

    4.2 billions reads/min

    1.2 billion updates/min

    2 billion writes a min

    MySQL Cluster

    View Slide

  52. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    20
    Can you have the best of both worlds????

    View Slide

  53. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    22
    Native JSON Data Type
    mysql> CREATE TABLE employees (data JSON);
    Query OK, 0 rows affected (0,01 sec)
    mysql> INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}');
    Query OK, 1 row affected (0,00 sec)
    mysql> INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}');
    Query OK, 1 row affected (0,00 sec)
    mysql> select * from employees;
    +---------------------------+
    | data |
    +---------------------------+
    | {"id": 1, "name": "Jane"} |
    | {"id": 2, "name": "Joe"} |
    +---------------------------+
    2 rows in set (0,00 sec)

    View Slide

  54. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    23
    JSON Details

    UTF8MB4 Character Set

    Optimized for READ intensive workloads

    Parse & Validation on Insert Only

    Dictionary

    Sorted object keys

    Fast access to array cell by index keys

    Support all JSON Data types plus date, time, timestamp

    View Slide

  55. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    25
    JSON Functions

    Manipulating JSON documents:

    jsn_array()

    jsn_object()

    jsn_insert()

    jsn_remove()

    jsn_set()

    jsn_replace()

    jsn_append()

    jsn_merge()

    jsn_extract()

    JSON Query:

    JSN_SEARCH()

    JSN_CONTAINS()

    JSN_CONTAINS_PATH()

    JSN_VALID()

    JSN_TYPE()

    JSN_KEYS()

    JSN_LENGTH()

    JSN_DEPTH()

    JSN_UNQUOTE()

    JSN_QUOTE()

    View Slide

  56. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    26
    Quick Example
    mysql> desc colors;
    +--------------+----------+------+-----+---------+-------------------+
    | Field | Type | Null | Key | Default | Extra |
    +--------------+----------+------+-----+---------+-------------------+
    | popular_name | char(10) | YES | | NULL | |
    | hue | json | YES | | NULL | |
    +--------------+----------+------+-----+---------+-------------------+
    2 rows in set (0.00 sec)
    INSERT INTO `colors` VALUES ('red','{\"value\": \"f00\"}'),
    ('green','{\"value\": \"0f0\"}'),('blue','{\"value\": \"00f\"}'),('cyan','{\"value\":
    \"0ff\"}'),('magenta','{\"value\": \"f0f\"}'),('yellow','{\"value\": \"ff0\"}'),
    ('black','{\"value\": \"000\"}');

    View Slide

  57. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    27
    Using jsn_extract
    mysql> SELECT jsn_extract(hue, '$.value') FROM colors WHERE
    jsn_extract(hue, '$.value')="f0f";
    +-----------------------------+
    | jsn_extract(hue, '$.value') |
    +-----------------------------+
    | "f0f" |
    +-----------------------------+
    1 row in set (0.00 sec)

    View Slide

  58. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    28
    Generated Columns

    Can be

    Stored – sales_tax = sales_amount * tax_rate

    Virtual – calculated on fly

    Used to build indexes on JSON columns for speed

    Used to use a default character set collation for case insensative
    string searches

    View Slide

  59. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    29
    JSON or Not to JSON

    Up to you

    Yes

    More flexible, use with data that is hard to model

    Support custom fields

    Easier denormalization

    No painful schema changes, easier prototyping

    Mix and match

    View Slide

  60. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
    30
    For More Information

    Mysql.com

    Labs.mysql.com

    Planet.mysql.com

    Dave Stokes
    [email protected]
    @stoker
    slideshare.net/davidmstokes

    View Slide