Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Migrating from Relational: Data Modeling and Access

Migrating from Relational: Data Modeling and Access

Would you like to know how CenterEdge Software consolidated their thirty SQL Server nodes into 5-node Couchbase clusters? Would you like to know the best practices for moving from Oracle, MySQL to Couchbase? Many applications and companies are migrating from relational databases to NoSQL. This talk will discuss the tipping points to consider the move. We'll then share the best practices and lessons learned at CenterEdge Software while moving from SQL Server to Couchbase. In addition, we'll discuss the best practices for moving from Oracle, MySQL to Couchbase.

Clarence J M Tauro (Couchbase)

November 08, 2016
Tweet

More Decks by Clarence J M Tauro (Couchbase)

Other Decks in Education

Transcript

  1. ©2016 Couchbase Inc. Migrating from relational data modeling and access

    Brant Burnett, Lead Developer, CenterEdge Clarence Tauro, Sr Trainer, Couchbase Marco Greco, Sr Engineer, N1QL R&D, Couchbase 1
  2. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Agenda •  Practical considerations

    for data and application migration •  Modeling in Couchbase •  Real life experience: Centeredge 2
  3. ©2016 Couchbase Inc. ©2016 Couchbase Inc. In this section • 

    Nomenclature •  Type and data model mapping •  Migrating data •  Business logic •  Monitoring 4
  4. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Nomenclature 5 Oracle Couchbase

    Database Bucket Table Bucket Row Document Column Field
  5. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Type Mapping 6 Oracle

    (PL/SQL Synonyms) Couchbase Number, Binary_real, Binary_integer Smallint, Int, Dec, Decimal, Float, … Number Char, Nchar, Varchar2, Nvarchar2 Character, String String Boolean Boolean Date, Timestamp Handled via String Interval (year to month, day to fracPon) Some support via _millis() funcPons
  6. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Modelling 7 CustomerID Name

    DOB CBC2016 Jane Smith 1990-01-30 CustomerID Type Cardnum Expiry CBC2016 visa 5827… 2019-03 CBC2016 master 6274… 2018-12 CustomerID ConnId Name CBC2016 XYZ987 Joe Smith CBC2016 SKR007 Sam Smith CustomerID item amt CBC2016 mac 2823.52 CBC2016 ipad2 623.52 CustomerID ConnId Name CBC2016 XYZ987 Joe Smith CBC2016 SKR007 Sam Smith Contacts Customer Billing ConnecPons Purchases { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "ConnecPons" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBC2016
  7. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Migration •  Generalized process

    •  Commercial tools •  Talend •  Informatica •  Open source •  Couchbase java importer •  Oracle2couchbase •  SQSL •  Importing from files 8
  8. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Migration Process •  High

    level process to migrate data from RDBMS to Couchbase using N1QL •  For each table •  Determine primary key columns •  Describe table •  For each row •  Generate document key from primary key columns •  Generate document from projection list description, column values •  INSERT INTO <bucket> (key, value) ($1, $2) •  Use key and document as placeholder values 9
  9. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Talend •  Talend connector

    for Couchbase -- Talend 5.3 or later •  http://developer.couchbase.com/documentation/server/4.5/ connectors/talend/talend.html •  Ingesting unstructured data •  Couchbase view support •  Seamless integration with Couchbase •  tCouchbaseInput •  Incoming data transformed into JSON documents and stored in Couchbase. •  User defines the data fields to be transformed into JSON attributes •  tCouchbaseOutput: uses the schema mapping to transform JSON documents into target data formats •  ODBC/JDBC drivers (provided by Simba and CData) 10
  10. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Informatica 11 •  Informatica

    Power Center •  Needs ODBC driver •  Informatica Cloud •  Needs JDBC driver •  ODBC/JDBC drivers (provided by Simba and CData) •  ETL & Data Integration •  Load data from any Relational system into Couchbase •  Export Couchbase data into RDBMS •  Seamlessly integrate Couchbase into rest of the Data fabric
  11. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Couchbase java importer • 

    Blog post by Laurent Doguin detailing journey from process to code •  Java based, but principle applies to other languages •  Geared to Postgres but principle applies to other engines •  Blog: http://blog.couchbase.com/2016/january/moving-sql-database-content-to-couchbase •  Source code: https://github.com/ldoguin/couchbase-java-importer 12
  12. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Oracle2couchbase •  Another blog

    post / opensource tool •  By Manuel Hurtado •  Java based •  Migrates from Oracle •  Blog: http://blog.couchbase.com/2016/february/moving-data-from-oracle-to-couchbase •  Source and binary: https://github.com/mahurtado/oracle2couchbase 13
  13. ©2016 Couchbase Inc. ©2016 Couchbase Inc. SQSL •  Client side

    SQL like scripting language developed by yours truly two decades ago •  Several nifty features like •  Expansion •  Data driven operation •  On the fly aggregation and redirection •  User defined routines •  Have recently written data source for Couchbase and json library •  Source: http://www.sqsl.org •  Example: let fromconn="sample"; connect to fromconn source db2cli; connect to "couchbase://192.168.1.104:8091" source cb; select * from db2inst1.dept connection fromconn insert into default (key, value) values($1, $2) using json:key("::", columns), json:row2doc(displaylabels, columns); 14
  14. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Import / Export utilities

    •  Upcoming version includes cbimport & cbexport •  File based utilities •  Need to export RDBDMS data to file first •  Load directly into data store bypassing N1QL 15
  15. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Business logic •  DDL

    •  Views •  Triggers •  Procedures •  Sequences •  Joins •  Transactions 16
  16. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Language comparison 17 Query

    Features SQL on RDBMS N1QL DML §  SELECT, INSERT, UPDATE, DELETE, MERGE §  SELECT, INSERT, UPDATE, DELETE, MERGE DDL §  CREATE [INDEX, PROCEDURE TABLE, TYPE, VIEW…] §  ALTER [TABLE, TYPE, …] §  DROP [INDEX, PROCEDURE TABLE, TYPE, VIEW…] §  CREATE [PRIMARY] INDEX §  DROP [PRIMARY] INDEX Query OperaDons §  Select, Join, Project, Subqueries §  Strict Schema §  Strict Type checking §  Select, Join, Project, Subqueries ü  Nest & Unnest ü  Look Ma! No Type Mismatch Errors! §  JSON keys act as columns Schema §  Predetermined Columns ü  Fully addressable JSON ü  Flexible document structure Data Types §  SQL Data types §  Conversion FuncPons §  JSON Data types §  Conversion FuncPons Query Processing §  INPUT: Sets of Tuples §  OUPUT: Set of Tuples §  INPUT: Sets of JSON §  OUTPUT: Set of JSON
  17. ©2016 Couchbase Inc. ©2016 Couchbase Inc. DDL •  Only Create

    Index / Drop Index exist in N1QL •  Everything else should be removed from the application •  Temporary tables •  Materialize results •  Store in memory, or •  Insert materialized document in a keyspace using a designated “type”: field and a UUID() as key •  DROP <temporary table> becomes DELETE FROM keyspace WHERE type=… and ID=<UUID> 18
  18. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Views •  Access underlying

    keyspaces instead •  Something akin to views can be obtained with •  View indexes •  Functional indexes •  CREATE INDEX … WHERE clauses 19
  19. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Statement blocks •  Handled

    by the application: •  Triggers •  Procedures 20
  20. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Sequences •  The eventual

    persistence engine handles atomic increments •  Special documents can be created with a counter and accessed atomically •  Can specify a delta on creation •  Must be done from SDK •  In python: •  N1QL does not •  Use UUID() instead 21
  21. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Joins •  Two types

    of joins •  Look up •  Index •  Joins use the document key •  Joining side can be an expression •  Joined side is document key •  Full expression joins not supported 22
  22. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Transactions •  In two

    words: No need •  Document modification is atomic •  Consistency can be specified at the REST call level or SDK •  REST example •  Add scan_consistency=[not_bounded|at_plus|request_plus|statement_plus] to REST parameters •  C SDK example •  Use lcb_n1p_setconsistency (…, [LCB_N1P_CONSISTENCY_NONE, LCB_N1P_CONSISTENCY_RYOW, LCB_N1P_CONSISTENCY_REQUEST, LCB_N1P_CONSISTENCY_STATEMENT]) when setting up request args 23
  23. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Monitoring •  Oracle • 

    ALTER [system|session] SET timed_statistics=true turns on timed statistics collection. •  V$SESSTAT, V$SYSSTAT, V$STATNAME dynamic performance views report timed statistics. •  EXPLAIN PLAN explains a statement. •  MySQL •  SET profiling=1 turns on profiling •  SHOW PROFILES displays available query profiles •  SHOW PROFILE displays the profile for a specific query •  EXPLAIN <statement> produces query plan •  Couchbase •  system:completed_requests virtual keyspace lists completed long running queries with timings and statistics •  system:active_requests virtual keyspace lists active queries with timings and statistics •  EXPLAIN <statement> explains request plan as a json document 24
  24. ©2016 Couchbase Inc. ©2016 Couchbase Inc. What is Data Modeling?

    26 •  A data model is a conceptual representation of the data structures that are required by a database •  The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects.
  25. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Conceptual Data Modeling • 

    Define entities, attributes and their relationships •  Entities: Main objects that are targets of your apps operates on •  Attributes: properties that your applications keep track of for the entity •  Relationships: definition connections to other entities - 1-1, 1-many, many-many Airline Airport Landmark Route Passenger Flight
  26. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Physical Data Model • 

    Phase II - Map entities, attributes and their relationships to containers provided by the underlying database solution RelaDonal Databases Couchbase Server Databases Buckets Tables Documents with type designator apribute OR Compound Keys Rows Items (Key-Value or Key-Document) Columns AMributes Index Index
  27. ©2016 Couchbase Inc. ©2016 Couchbase Inc. [email protected] { “name”:…, “flights”:[

    {“_id”:“route_1000”, “flight”:…,}, {“_id”:”route_6421”, “flight”:…,} …], … } Physical Data Modeling route_1000 { “id”:”1000”, “airline”: “AF”, “sourceairport”:”TLV”, “desPnaPonairport”:”MRW”, …} airport_TLV { “id”:”126701”, “airportname”: “TLV”, “geo”:{ “lat”:…,“long”:…}, …} Flights
  28. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Data Modeling Approaches 30

    NoSQL Relaxed NormalizaPon schema implied by structure fields may be empty, duplicate, or missing RelaDonal Required NormalizaPon schema enforced by db same fields in all records •  Minimize data inconsistencies (one item = one locaPon) •  Reduced update cost (no duplicated data) •  Preserve storage resources •  OpPmized to planned/actual access paperns •  Flexibly with soxware architecture •  Supports clustered architecture •  Reduced server overhead
  29. ©2016 Couchbase Inc. ©2016 Couchbase Inc. JSON Design Choices 31

    •  Couchbase Server neither enforces nor validates for any particular document structure •  Choices that impact JSON document design: –  Single Root Attributes –  Objects vs. Arrays –  Array Element Types –  Timestamp Formats –  Empty and Null Property Values –  JSON Schema
  30. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Root Attributes vs. Embedded

    Attributes 32 •  The choice of having a single root attribute or the “type” attribute embedded.
  31. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Objects vs. Arrays 33

    •  The choice of having an object type, or an array type
  32. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Array Element Types Array

    of strings Array of objects 34 •  Array elements can be simple types, objects or arrays:
  33. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Timestamp Formats Array of

    Pme components String (ISO 8601) Number (Unix style) (Epoch) •  Working and dealing with timestamps has been challenging ever since •  When storing timestamps, you have at least 3 options: 16
  34. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Empty and Null Property

    Values 36 •  Keep in mind that JSON supports optional properties •  If a property has a null value, consider dropping it from the JSON, unless there's a good reason not to •  N1QL makes it easy to test for missing or null property values •  Be sure your application code handles the case where a property value is missing SELECT * FROM couchmusic1 WHERE userprofile.address IS NULL; SELECT * FROM couchmusic1 WHERE userprofile.gender IS MISSING;
  35. ©2016 Couchbase Inc. ©2016 Couchbase Inc. JSON Schema 37 • 

    Couchbase Server pays absolutely no attention to the shape of your JSON documents so long as they are well-formed •  There are times when it is useful to validate that a JSON document conforms to some expected shape •  JSON Schema is a JSON-based format for defining the structure of JSON data •  There are implementations for most popular programming languages •  Learn more here: http://json-schema.org
  36. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Data Nesting (aka Denormalization)

    38 •  Relational database design promotes separating data using normalization, which doesn’t scale •  For NoSQL systems, we often avoid normalization so that we can scale •  Nesting allows related objects to be organized into a hierarchical tree structure where you can have multiple levels of grouping •  Rule of thumb is to nest no more than 3 levels deep unless there is a very good reason to do so •  You will often want to include a timestamp in the nested data
  37. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of Data Nesting

    •  Playlist with owner attribute containing username of corresponding userprofile 39 Document Key: copilotmarks61569
  38. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of Data Nesting

    •  Playlist with owner attribute containing a subset of the corresponding userprofile 40 * Note the inclusion of the updated apribute
  39. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Choices with JSON Key

    Design 41 •  A key formed of attributes that exist in the real world: –  Phone numbers –  Usernames –  Social security numbers –  Account numbers –  SKU, UPC or QR codes –  Device IDs
  40. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Surrogate Keys 42 • 

    We often use surrogate keys when no obvious natural key exist •  They are not derived from application data •  They can be generated values –  3305311F4A0FAAFEABD001D324906748B18FB24A (SHA-1) –  003C6F65-641A-4CGA-8E5E-41C947086CAE (UUID) •  They can be sequential numbers (often implemented using the Counter feature of Couchbase Server) –  456789, 456790, 456791, …
  41. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Making Tough Choices 43

    •  We must also make trade-offs in data modeling: –  Document size –  Atomicity –  Complexity –  Speed
  42. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Embed vs. Refer 44

    •  All of the previous trade-offs are usually rolled into a single decision – whether to embed or refer •  When to embed? •  When to refer?
  43. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Want to know more

    on Data Modeling? 45 •  Session tomorrow – “Agile Document Models and Data Structures” at 1:00PM
  44. ©2016 Couchbase Inc. ©2016 Couchbase Inc. 46 Brant Burnett Software

    Development Team Lead Couchbase Community Expert
  45. ©2016 Couchbase Inc. ©2016 Couchbase Inc. About CenterEdge Software ü 

    Point of Sale ü  Admissions & Ticketing ü  Party, Group & Event Bookings ü  Online Sales & Party Reservations ü  Time Clock & Labor Management ü  & More! 47
  46. ©2016 Couchbase Inc. ©2016 Couchbase Inc. About CenterEdge Software • 

    Celebrating 12 Year Anniversary •  Team of 50 in Roxboro, NC •  Sister company is Palace Pointe, a 100k sq. ft. Entertainment Venue for which we were developed as an in-house system •  Over 600 facilties using our platform across the US and abroad •  FEC’s, Waterparks, Trampoline Parks, Amusement Parks, Skating Rinks, Bowling Centers, Zoos & Museums 48
  47. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Why Couchbase For CenterEdge’s

    Newest Cloud Platform? •  More scalable and performant than traditional SQL in the cloud •  Previous online store system uses 19 SQL servers, each hosting 30 stores •  As each store is only on a single server, it doesn’t handle spikes in load efficiently •  Servers can’t be scaled vertically without downtime for all 30 stores on that server •  Schema-less JSON increases flexibility as your system evolves, leaving schema enforcement in your data access layer •  Schema changes to large tables can result in downtime as data structure is updated across all records •  We were already using Couchbase for our shopping carts as well as a SQL caching layer, with great success. Now we can simplify the architecture with a single data layer. 49
  48. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Couchbase Cloud Data Flow

    Architecture 50 Data Data Data Index Query Web Servers Remote Application Servers
  49. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Enforcing Schema •  Since

    Couchbase doesn’t enforce schema like SQL, your data access layer should do so instead •  At CenterEdge, each document type is only updated by a single service •  Within that service, schema is enforced by serializing data from consistent POCOs •  Schema changes can be supported using customized JSON converters during deserialization •  IS MISSING is a good way to recognize the difference in attributes that weren’t stored because the document was saved using the old schema •  Where possible, try to predict possible schema needs in advance 51
  50. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Pay Attention To Document

    Modeling Up Front •  Watch out for documents that get too large •  Might hit 20MB document size limit •  High serialization/deserialization/networking performance penalties •  Document contention as too many actions attempt to modify the document simultaneously •  Watch out for data spread across too many related documents •  Lack of atomic transactions across multiple writes can result in partial updates •  Can add latency if documents must be read in a chained manner (i.e. each document contains the key to the next document) •  Be sure to include document keys, or a way to construct them, where you may want to use N1QL JOIN or NEST operations •  Should the document key be stored inside the document, too? •  Increases data size, as the key is in the document and in the metadata •  Requires that the data layer maintain consistency •  Can make queries easier since you don’t need to use META() function to get the key 52
  51. ©2016 Couchbase Inc. ©2016 Couchbase Inc. View Indexes vs. Global

    Secondary Indexes •  Be sure to analyze what type of index is best for each workload •  Views are great where pre-aggregating numbers is useful, such as reports, graphs, etc •  GSI is usually the best option for more generic queries, especially if when you’re just trying to collect a set of documents •  Views don’t scale as cleanly, they can’t be scaled independently via Multi Dimensional Scaling •  Views live on the data nodes, so they only scale as you add more data nodes •  At CenterEdge, our new platform started on Couchbase Server 3.0, before Global Secondary Indexes were an option •  We used views and lookup documents for most of our indexing needs •  We have run into problems with too many views per bucket causing performance bottlenecks •  We’re currently transitioning many of these views into Global Secondary Indexes 53
  52. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Efficient Indexing Is Especially

    Important For Couchbase 54 •  Primary key scans in SQL have always been inefficient •  Every record in the table would be read and checked for a match to the WHERE predicate •  For small tables, the performance penalty was negligible, and would usually go unnoticed •  Primary key scans in Couchbase are usually much worse •  In our experience with production-scale data, almost invariably results in queries timing out •  Every record in the bucket is being read and checked for a match to the WHERE predicate •  Can easily result in reading and parsing millions of JSON documents •  Will also bust the in-memory cache on the data nodes if there is more data in the bucket than allocated memory •  Design every query to be supported by a Global Secondary Index •  Helps even if the index isn’t an exact match •  A good design can vastly reduce the number of documents scanned, making it more like a SQL primary key scan
  53. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Efficient Indexing Is Especially

    Important For Couchbase 55 /* Use predicate to only index documents of a certain type */ CREATE INDEX `airport_sourceairport` ON `travel-sample` (`sourceairport`) WHERE `type` = 'airport' /* To index the same attribute across multiple document types, include type attribute first */ CREATE INDEX `def_type_id` ON `travel-sample` (`type`, `id`) /* A good practice is to create a fallback in case other indexes aren't used */ CREATE INDEX `def_type` ON `travel-sample` (`type`) If you’re using the “type” attribute as the logical equivalent of a table in SQL, most indexes will include this attribute.
  54. ©2016 Couchbase Inc. ©2016 Couchbase Inc. How To Store and

    Index Date/Times •  Date/Times are usually stored as ISO 8601 strings in JSON •  Use STR_TO_MILLIS(x) in indexes and queries to work with ISO 8601 strings 56 /* STR_TO_MILLIS converts an ISO8601 string to a Unix numeric representation */ /* It also handles the time zone specifier */ SELECT `Extent1`.* FROM `beer-sample` as `Extent1` WHERE (`type` = 'beer') AND (STR_TO_MILLIS(`Extent1`.`updated`) <= STR_TO_MILLIS("2010-01-01T00:00:00Z")) /* STR_TO_MILLIS must also be used in the index, or the index cannot be used */ CREATE INDEX `beer_updated` ON `beer-sample` (STR_TO_MILLIS(`updated`)) WHERE `type` = 'beer'
  55. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Index Performance During Mutations

    57 Airline SQL Table Airport SQL Table travel-sample Bucket Airline Indexes Airport Indexes Bucket Indexes Remember that GSI indexes are similar to SQL indexes, but not the same
  56. ©2016 Couchbase Inc. ©2016 Couchbase Inc. Training! •  Don’t just

    assume you can switch to any NoSQL platform without some training •  Performance profile is different, and the penalties can appear in different places •  Developers who know the pitfalls in advance can save you a lot of refactoring headaches later •  N1QL does help reduce the learning curve significantly •  For .Net development shops, look at Linq2Couchbase to make it even easier! •  The operations department needs training, too! 58
  57. ©2016 Couchbase Inc. 60 Share your opinion on Couchbase 1. 

    Go here: http://gtnr.it/2eRxYWn 2.  Create a profile 3.  Provide feedback (~15 minutes)