knowledge of the value's format • Completely schema-less • Implementations • Eventually consistent, hierarchal, ordered, in-RAM • Operations • Get, set, and delete values by key
fields • Organized by collections, tags, metadata, etc. • Formats such as XML, JSON, BSON • Structure may vary by document (schema-less) • Operations • Query by namespace, ID, or field values • Insert new documents or update existing fields
model allows horizontal scaling • Maintain functionality whenever possible • Minimize the learning curve • Easy to setup and deploy anywhere • JavaScript and JSON are ubiquitous • Automate sharding and replication
architectures • Idiomatic drivers for over a dozen languages • Data format and wire protocol use BSON • Multiple storage engines for different use cases • MMAPv1, WiredTiger, RocksDB • B-tree, geospatial, TTL, and text indexes
Data mapper pattern completely separates domain models from persistence logic • Model classes, query builders, relationships • Interoperability with Doctrine ORM for SQL • Now stable after five years of beta!
VoIP callers, and determine where the calls originated. Phone switch created Ids for each VoIP user, and those Ids were recycled every 15 minutes. User base of 20,000 and growing
1 Insert Picture Here NoSQL and JSON with MySQL Dave Stokes MySQL Community Manager [email protected] @stoker Slideshare.net/davidmstokes Insert Picture Here
2 Safe Harbor The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
5 Relational Data • ACID (from Wikipedia https://en.wikipedia.org/wiki/ACID) • Atomicity – equires that each transaction be "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged • Consistency – ensures that any transaction will bring the database from one valid state to another. • Isolation – the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially • Durability – means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
6 So why NoSQL?!? • A NoSQL (often interpreted as Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. (https://en.wikipedia.org/wiki/NoSQL) • Motivations for this approach include simplicity of design, presumed better "horizontal" scaling to clusters of machine, which is a problem for relational databases, and presumed finer control over availability.
7 CAP Theorem CAP theorem states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it succeeded or failed) • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures) (https://en.wikipedia.org/wiki/CAP_theorem)
8 Want BOTH! What to do?!?!?! • Polyglot Databases – Use best storage approach for the data • Oracle, Postgresql, MySQL, etc. all adopting some NoSQL features • NoSQL trying to adopt SQL • Vendors not dumb! • Better for consumers
9 Access SQL and NoSQL One set of disks Simultaneous access MySQL InnoDB tables and NDB tables Use SQL and/or Key/Value pair 2,000,000,000 writes a minute with MySQL Cluster At the same time!
12 Benefits Raw performance for simple lookups. Direct access to the InnoDB storage engine avoids the parsing and planning overhead of SQL. Running memcached in the same process space as the MySQL server avoids the network overhead of passing requests back and forth. Data is stored in a MySQL database to protect against crashes, outages, and corruption. The transfer between memory and disk is handled automatically, simplifying application logic. Data can be unstructured or structured, depending on the type of application. You can make an all-new table for the data, or map the NoSQL-style processing to one or more existing tables.
13 Benefits continued You can still access the underlying table through SQL, for reporting, analysis, ad hoc queries, bulk loading, set operations such as union and intersection, and other operations well suited to the expressiveness and flexibility of SQL. You can ensure high availability of the NoSQL data by using this feature on a master server in combination with MySQL replication. The integration of memcached with MySQL provides a painless way to make the in-memory data persistent, so you can use it for more significant kinds of data. You can put more add, incr, and similar write operations into your application, without worrying that the data could disappear at any moment. You can stop and start the memcached server without losing updates made to the cached data. To guard against unexpected outages, you can take advantage of InnoDB crash recovery, replication, and backup procedures. The way InnoDB does fast primary key lookups is a natural fit for memcached single-item queries. The direct, low-level database access path used by the memcached plugin is much more efficient for key- value lookups than equivalent SQL queries.
14 More Benefits Continued The serialization features of memcached, which can turn complex data structures, binary files, or even code blocks into storeable strings, offer a simple way to get such objects into a database. Because you can access the underlying data through SQL, you can produce reports, search or update across multiple keys, and call functions such as AVG() and MAX() on the memcached data. All of these operations are expensive or complicated with the standalone memcached. You do not need to manually load data into memcached at startup. As particular keys are requested by an application, the values are retrieved from the database automatically, and cached in memory using the InnoDB buffer pool. Because memcached consumes relatively little CPU, and its memory footprint is easy to control, it can run comfortably alongside a MySQL instance on the same system. Because data consistency is enforced through the usual mechanism as with regular InnoDB tables, you do not have to worry about stale memcached data or fallback logic to query the database in the case of a missing key
16 Here is an example using telnet to send memcached commands and receive results through the ASCII protocol: • telnet 127.0.0.1 11211 • set a11 10 0 9 • 123456789 • STORED • get a11 • VALUE a11 0 9 • 123456789 • END • quit Set memory location 'a11' to hold 9 characters '123456789' – 10 & 0 are TTL and flags
17 Can it be used with MySQL Replication? Because the InnoDB memcached daemon plugin supports the MySQL binary log, any updates made on a master server through the memcached interface can be replicated for backup, balancing intensive read workloads, and high availability. All memcached commands are supported for binlogging. You do not need to set up the InnoDB memcached plugin on the slave servers. In this configuration, the primary advantage is increased write throughput on the master. The speed of the replication mechanism is not affected
23 JSON Details • UTF8MB4 Character Set • Optimized for READ intensive workloads • Parse & Validation on Insert Only • Dictionary • Sorted object keys • Fast access to array cell by index keys • Support all JSON Data types plus date, time, timestamp
28 Generated Columns • Can be • Stored – sales_tax = sales_amount * tax_rate • Virtual – calculated on fly • Used to build indexes on JSON columns for speed • Used to use a default character set collation for case insensative string searches
29 JSON or Not to JSON • Up to you • Yes • More flexible, use with data that is hard to model • Support custom fields • Easier denormalization • No painful schema changes, easier prototyping • Mix and match