Slide 1

Slide 1 text

©2016 Couchbase Inc. Agile Document Models & Data Structures 1

Slide 2

Slide 2 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Speaking Your Language •  Topics for today: •  Data structures - tie into native language collection interfaces •  Sub-document - lower level access with focused power •  Data modeling with Couchbase •  Session: “Picking the right API for the right job” •  SDK Goal: complex data access made easy •  More than just a document storage/retrieval system •  Tight SDK integration is key •  Consistent, transparent developer experience across languages 2

Slide 3

Slide 3 text

©2016 Couchbase Inc. 3 Data Structures API

Slide 4

Slide 4 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Couchbase SDK Data Structures API •  Target SDK release along with 4.6 •  Builds on awesomeness of sub-document API •  Simplified access without touching whole document •  Make JSON data types transparent •  Native integration of Map, List, Set, Queues… •  Java Collections Framework •  .NET System.Collections •  Python, Node.js, Go 4

Slide 5

Slide 5 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Typical Document Data Access JSON Doc CB JSON Object SDK Collec?ons Framework App 5

Slide 6

Slide 6 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Simplified Data Structure Access JSON Doc CB Collec?ons Framework SDK DS 6 App “user1”: {“name”:... , “address”:.. , “favs”: [...]}, “user2”:{“name” , “address” ..., ”favs”: [...]}, for (String f : favs) {} “user1”: {“name”:... , “address”:.. , “favs”: [...]}, “user2”:{“name” , “address” ..., ”favs”: [...]},

Slide 7

Slide 7 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Targeted Collection Updates Item From Collec?on App Sub-doc Update CB 7 MapAdd(“user1”,”favs”, “newfav”) “user1”: {“name”:... , “address”:.. , “favs”: [...]}, “user2”:{“name” , “address” ..., ”favs”: [...]},

Slide 8

Slide 8 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. The Four Data Structures… Structure JSON Type JSON Example Lists -  Append, prepend, insert -  Size/count JSON Array: [… , ... ] [ 1, 2, “abc” ] Maps -  Add/remove by key -  Size/count JSON Object: { “key”: “value”} { “name”: “value” } Sets -  Specialized add/remove -  Unique values -  Size/count JSON Array: [ … , ... ] [ 1, 3, 6, 8 ] Queue -  First in – first out -  Pop – retrieve/remove -  Size/count JSON Array: [… , ... ] [ “task1”, “task2”, “task3” ] remove 1... [ “task2”, “task3”, “task4” ] 8

Slide 9

Slide 9 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Consistent Access Across Languages Func:ons Lists ListGet ListPush ListShift ListDelete ListSet ListSize >namesList = bucket.ListGet(“key”) >print namesList [‘name1’,’name2’,’name3’] Maps MapGet MapRemove MapSize MapSet Sets SetAdd SetExists SetSize SetRemove Queue QueuePush QueuePop QueueSize QueueRemove 9 •  Idiomatic -vs- functional •  Java Collections Framework •  .NET System.Collections •  As well as functional approach * Experimental features alert: may add/remove to this list – feedback welcome!

Slide 10

Slide 10 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Consistent Access Across Languages 10 Collec:ons Approach Lists List namesList = new CouchbaseArrayList("key", bucket); for (String name : namesList) { … } Maps var namesDict = new CouchbaseDictionary(_bucket, “key”); namesDict.Add(“newkey1”, new Poco { Name = “poco1” }); Sets var namesSet = new CouchbaseSet(_bucket, "pocos"); namesSet.Add(new Poco { Key = "poco1", Name = "Poco-pica" }); namesSet.Remove(new Poco {Key = "poco1", Name = "Poco-pica"}); foreach(var poco in namesSet){ … } Queue var namesQueue = new CouchbaseQueue(_bucket, key); namesQueue.Enqueue(new Poco { Name = "pcoco1" }); var item = namesQueue.Peek(); •  Support for advanced capabilities of collection frameworks

Slide 11

Slide 11 text

©2016 Couchbase Inc. 11 Sub-Document API

Slide 12

Slide 12 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Sub-Document API “The sub-document API enables you to access parts of JSON documents (sub-documents) efficiently without requiring the transfer of the entire document over the network. This improves performance and brings better efficiency to the network IO path, especially when working with large JSON documents.” •  First released in 4.5, support cross SDK •  Efficient document lookup, insert & update •  Powerful lower level control, focusing on particular elements •  Keep work on server •  Two methods available – lookup and mutate/change 12

Slide 13

Slide 13 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Digging Below Data Structures Data Structures API Sub-Document API MapGet(key, mapkey) LookupIn(key).get(mapkey) MapRemove(key, mapkey) MutateIn(key).remove(mapkey) MapSet(key, mapkey, value, createMap) MutateIn(key).(mapkey, value, create_doc=createMap) 13

Slide 14

Slide 14 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Sub-Document API 14 Opera:ons LookupIn LookupIn(key, operation(path)) Get Exists Execute MutateIn MutateIn(key, operation(path, value)) Counter Insert Remove Replace Upsert Execute
 arrayAddunique arrayAppend arrayInsert arrayPrepend Chaining Opera:ons MutateIn(key, operation(path, value), 
 operation(path, value), operation(path, value)) Returns SubdocResult, Spec), results=[(0, u'subvalue1'), (0, None)]>

Slide 15

Slide 15 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Sample Sub-Document Lookup 15 LookupIn(key, operation(path)) LookupIn(‘copilotmark’)
 .get(‘phones.number')
 .execute();
 LookupIn(‘copilotmark’)
 .exists(‘phones’)
 .get(‘phones.number')
 .get(‘gender’)
 .execute(); SubdocResult, ,

Slide 16

Slide 16 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Sample Sub-Document Change 16 MutateIn(key, path, value) MutateIn(‘copilotmark’) 
 .replace(‘phones.number’, 
 ‘212-787-2212’)
 .upsert(‘nickname’, ‘Freddie’)
 .execute()

Slide 17

Slide 17 text

©2016 Couchbase Inc. 17 Data Modeling for Couchbase Server

Slide 18

Slide 18 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. What is Data Modeling? 18 •  A data model is a conceptual representation of the data structures that are required by a database •  The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects.

Slide 19

Slide 19 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Data Modeling Approaches 19 NoSQL Relaxed Normaliza?on schema implied by structure fields may be empty, duplicate, or missing Rela:onal Required Normaliza?on schema enforced by db same fields in all records •  Minimize data inconsistencies (one item = one loca?on) •  Reduced update cost (no duplicated data) •  Preserve storage resources •  Op?mized to planned/actual access pagerns •  Flexibly with soiware architecture •  Supports clustered architecture •  Reduced server overhead

Slide 20

Slide 20 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Modeling Couchbase Documents 20 •  Couchbase Server is a document database •  Data is stored in JSON documents, not in tables •  Relational databases rely on an explicit pre-defined schema to describe the structure of data •  JSON documents are self-describing

Slide 21

Slide 21 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. What and Why JSON? 21 •  What is JSON? –  Lightweight data interchange format –  Based on JavaScript –  Programming language independent –  Field names must be unique •  Why JSON? –  Schema flexibility –  Less verbose –  Can represent Objects and Arrays (including nested documents) There is NO IMPEDENCE MISMATCH between a JSON Document and a Java Object

Slide 22

Slide 22 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. JSON Design Choices 22 •  Couchbase Server neither enforces nor validates for any particular document structure •  Choices that impact JSON document design: –  Single Root Attributes –  Objects vs. Arrays –  Array Element Types –  Timestamp Formats –  Property Names –  Empty and Null Property Values –  JSON Schema

Slide 23

Slide 23 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Root Attributes vs. Embedded Attributes 23 •  The choice of having a single root attribute or the “type” attribute embedded.

Slide 24

Slide 24 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Root Attributes vs. Embedded Attributes 24 •  Accessing the document with a root attribute SELECT track.* FROM couchmusic

Slide 25

Slide 25 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Root Attributes vs. Embedded Attributes 25 •  Accessing the document with the “type” attribute SELECT * FROM couchmusic WHERE type=‘track’

Slide 26

Slide 26 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Objects vs. Arrays 26 •  The choice of having an object type, or an array type

Slide 27

Slide 27 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Objects vs. Arrays 27 •  How would the object look like? class UserProfile{ Phone phones; } class Phone{ String cell; String landline; }

Slide 28

Slide 28 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Objects vs. Arrays 28 •  How would the object look like? class UserProfile{ List phones; } class Phone{ String number; String type; }

Slide 29

Slide 29 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Array Element Types Array of strings Array of objects 29 •  Array elements can be simple types, objects or arrays:

Slide 30

Slide 30 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Array Element Types Array of strings 30 •  Array elements can be simple types, objects or arrays: class Playlist{ List tracks; } ... String trackId = tracks.get(1); JsonDocument trackDocument = bucket.get(trackId) Mul:ple get() calls to retrieve the document. Worth it?

Slide 31

Slide 31 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Array Element Types 31 •  Array elements can be simple types, objects or arrays: class Playlist{ List tracks; } ... myPlaylist.getTracks() .get(1).getArtistName(); Limited Denormaliza:on: commonly needed data (e.g., ?tle) in local object, detail available in referenced foreign document

Slide 32

Slide 32 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Timestamp Formats Array of ?me components String (ISO 8601) Number (Unix style) (Epoch) •  Working and dealing with timestamps has been challenging ever since •  When storing timestamps, you have at least 3 options: 16

Slide 33

Slide 33 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Observed Practices with Timestamp Formats •  Storing as Epoch will help you to easily sort the documents •  If you wanted the documents to be sorted in the order of their “last update” time •  SELECT * FROM couchmusic WHERE type = ‘track’ ORDER BY updates DESC •  Storing date as array format helps •  To grouping 16

Slide 34

Slide 34 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Taking Advantage of Storing Date as an Array •  Group options can be specified to control the execution of the view •  The group and group_level options are only useful when a Reduce function has been defined in the corresponding View •  The group_level option, used when the key is an Array, determines how many elements of the key are used when aggregating the results. 16

Slide 35

Slide 35 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of View group_level = 1 Key Value [2014] 36 [2015] 20 Execute Reduce Key Value [2014,11,29,18,49,36] 3 [2014,12,03,20,11,26] 5 [2014,12,03,23,37,21] 2 [2014,12,06,10,12,19] 8 [2014,12,09,05,01,26] 3 [2014,12,18,01,04,30] 11 [2014,12,26,18,34,44] 4 [2015,01,03,16,48,32] 7 [2015,01,03,20,20,06] 5 [2015,01,15,08,17,28] 8 Copyright © 2015 Couchbase, Inc. 35 •  For the data below with Reduce function defined as _sum and group_level = 1

Slide 36

Slide 36 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of View group_level = 2 Key Value [2014,11] 3 [2014,12] 33 [2015,01] 20 Key Value [2014,11,29,18,49,36] 3 [2014,12,03,20,11,26] 5 [2014,12,03,23,37,21] 2 [2014,12,06,10,12,19] 8 [2014,12,09,05,01,26] 3 [2014,12,18,01,04,30] 11 [2014,12,26,18,34,44] 4 [2015,01,03,16,48,32] 7 [2015,01,03,20,20,06] 5 [2015,01,15,08,17,28] 8 Copyright © 2015 Couchbase, Inc. 36 •  For the data below with Reduce function defined as _sum and group_level = 2 Execute Reduce

Slide 37

Slide 37 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of View group_level = 3 Key Value [2014,11,29,18,49,36] 3 [2014,12,03,20,11,26] 5 [2014,12,03,23,37,21] 2 [2014,12,06,10,12,19] 8 [2014,12,09,05,01,26] 3 [2014,12,18,01,04,30] 11 [2014,12,26,18,34,44] 4 [2015,01,03,16,48,32] 7 [2015,01,03,20,20,06] 5 [2015,01,15,08,17,28] 8 Key Value [2014,11,29] 3 [2014,12,03] 7 [2015,12,06] 8 [2015,12,09] 3 [2015,12,18] 11 [2015,12,26] 4 [2014,01,03] 12 [2014,01,15] 8 Copyright © 2015 Couchbase, Inc. 37 •  For the data below with Reduce function defined as _sum and group_level = 3 Execute Reduce

Slide 38

Slide 38 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Empty and Null Property Values 38 •  Keep in mind that JSON supports optional properties •  If a property has a null value, consider dropping it from the JSON, unless there's a good reason not to •  N1QL makes it easy to test for missing or null property values •  Be sure your application code handles the case where a property value is missing SELECT * FROM couchmusic1 WHERE userprofile.address IS NULL; SELECT * FROM couchmusic1 WHERE userprofile.gender IS MISSING;

Slide 39

Slide 39 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Empty, Null and Missing Property Values 39 { countryCode: “UK”, currencyCode: “GBP”, region: “Europe” } { countryCode: “UK”, currencyCode: “GBP”, region: “” } WHERE region IS NOT MISSING, IS NOT NULL, IS VALUED WHERE region IS NOT MISSING, IS NOT NULL, IS NOT VALUED { countryCode: “UK”, currencyCode: “GBP” } { countryCode: “UK”, currencyCode: “GBP”, region: null } WHERE region IS MISSING WHERE region IS NULL

Slide 40

Slide 40 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. JSON Schema 40 •  Couchbase Server pays absolutely no attention to the shape of your JSON documents so long as they are well-formed •  There are times when it is useful to validate that a JSON document conforms to some expected shape •  JSON Schema is a JSON-based format for defining the structure of JSON data •  There are implementations for most popular programming languages •  Learn more here: http://json-schema.org

Slide 41

Slide 41 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of JSON Schema 41

Slide 42

Slide 42 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of JSON Schema – Type Specification Available type specifica?ons include: •  array •  boolean •  integer •  number •  object •  string •  enum 42

Slide 43

Slide 43 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Type specific valida?ons include: •  minimum •  maximum •  minLength •  maxLength •  format •  pagern 43 Example of JSON Schema – Type Specific Validation

Slide 44

Slide 44 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of JSON Schema – Required Properties Required proper?es can be specified for each object 44

Slide 45

Slide 45 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example of JSON Schema – Additional Properties Addi?onal proper?es can be disabled 45

Slide 46

Slide 46 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Data Nesting (aka Denormalization) 46 •  As you know, relational database design promotes separating data using normalization, which doesn’t scale •  For NoSQL systems, we often avoid normalization so that we can scale •  Nesting allows related objects to be organized into a hierarchical tree structure where you can have multiple levels of grouping •  Rule of thumb is to nest no more than 3 levels deep unless there is a very good reason to do so •  You will often want to include a timestamp in the nested data

Slide 47

Slide 47 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example #1 of Data Nesting •  Playlist with owner attribute containing username of corresponding userprofile 47 Document Key: copilotmarks61569

Slide 48

Slide 48 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example #1 of Data Nesting •  Playlist with owner attribute containing a subset of the corresponding userprofile 48 * Note the inclusion of the updated agribute

Slide 49

Slide 49 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example #2 of Data Nesting •  Playlist with tracks attribute containing an array of track IDs 49

Slide 50

Slide 50 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Example #2 of Data Nesting •  Playlist with tracks attribute containing an array of track objects 50 * Note the inclusion of the updated agribute

Slide 51

Slide 51 text

©2016 Couchbase Inc. 51 Key Design

Slide 52

Slide 52 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Choices with JSON Key Design 52 •  A key formed of attributes that exist in the real world: –  Phone numbers –  Usernames –  Social security numbers –  Account numbers –  SKU, UPC or QR codes –  Device IDs •  Often the first choice for document keys •  Be careful when working with any personally identifiable information (PII), sensitive personal information (SPI) or protected health information (PHI)

Slide 53

Slide 53 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Surrogate Keys 53 •  We often use surrogate keys when no obvious natural key exist •  They are not derived from application data •  They can be generated values –  3305311F4A0FAAFEABD001D324906748B18FB24A (SHA-1) –  003C6F65-641A-4CGA-8E5E-41C947086CAE (UUID) •  They can be sequential numbers (often implemented using the Counter feature of Couchbase Server) –  456789, 456790, 456791, …

Slide 54

Slide 54 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Key Value Patterns •  Common practice for users of Couchbase Server to follow patterns for formatting key values by using symbols such as single or double colons •  DocType::ID –  userprofile::fredsmith79 –  playlist::003c6f65-641a-4c9a-8e5e-41c947086cae •  AppName::DocType::ID –  couchmusic::userprofile::fredsmith79 54 Enables Mul:-Tenency –  pizza::user::101 –  Pizza::user::102 –  burger::user::101 –  burger::user::102

Slide 55

Slide 55 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Lookup Key Pattern 55 •  The purpose of the Lookup Key Pattern is to allow multiple ways to reach the same data, essentially a secondary index •  For example, we want to lookup a Userprofile by their email address instead of their ID •  To accomplish this, we create another small document that refers to the Userprofile document we are interested in •  Implementing this pattern is straightforward, just create an additional document containing a single property that stores the key to the primary document •  With the introduction of N1QL, this pattern will be less commonly used

Slide 56

Slide 56 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Lookup Key Pattern 56 userprofile::copilotmarks61569 [email protected] JSON [email protected] •  Lookup document can be JsonDocument or StringDocument

Slide 57

Slide 57 text

©2016 Couchbase Inc. 57 Trade-offs in Data Modeling

Slide 58

Slide 58 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Making Tough Choices 58 •  We must also make trade-offs in data modeling: –  Document size –  Atomicity –  Complexity –  Speed

Slide 59

Slide 59 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Document Size 59 •  Couchbase Server supports documents up to 20 Mb •  Larger documents take more disk space, more time to transfer across the network and more time to serialize/deserialize •  If you are dealing with documents that are potentially large (greater than 1 Mb), you must test thoroughly to find out if speed of access is adequate as you scale. If not, you will need to break up the document into smaller ones. •  You may need to limit the number of dependent child objects you embed

Slide 60

Slide 60 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Atomicity 60 •  Atomicity in Couchbase Server is at the document level •  Couchbase Server does not support transactions •  They can be simulated if you are willing to write and maintain additional code to implement them (generally not recommended) •  If you absolutely need changes to be atomic, they will have to be part of the same document •  The maximum document size for Couchbase Server may limit how much data you can store in a single document

Slide 61

Slide 61 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Complexity 61 •  Complexity affects every area of software systems including data modeling •  The complexity of queries (N1QL) •  The complexity of code for updating multiple copies of the same data

Slide 62

Slide 62 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Speed 62 •  As it relates to data modeling, speed of access is critical •  When using N1QL to access data, keep in mind that query by document key is fastest and query by secondary index is usually much slower •  If implementing an interactive use case, you will want to avoid using JOINs •  You can use data duplication to improve the speed of accessing related data and thus trade improved speed for greater complexity and larger document size •  Keep in mind that Couchbase Views can be used when up to the second accuracy is not required

Slide 63

Slide 63 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Remember 63 SDK get() is faster than (get by key) N1QL with MOI is faster than N1QL with GSI is faster than Model you document key, such that you document can be retrieved with the key, if possible, than a N1QL query

Slide 64

Slide 64 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Embed vs. Refer 64 •  All of the previous trade-offs are usually rolled into a single decision – whether to embed or refer •  When to embed: –  Reads greatly outnumber writes –  You're comfortable with the slim risk of inconsistent data across the multiple copies –  You're optimizing for speed of access •  When to refer: –  Consistency of the data is a priority –  You want to ensure your cache is used efficiently –  The embedded version would be too large or complex

Slide 65

Slide 65 text

©2016 Couchbase Inc. ©2016 Couchbase Inc. Next Steps •  Flexible data access is key to solutions using document stores •  Join us for discussion on Forums or discuss with our experts here •  https://forums.couchbase.com •  https://developer.couchbase.com/server 65

Slide 66

Slide 66 text

©2016 Couchbase Inc. 66 Get Trained on Couchbase http://training.couchbase.com http://training.couchbase.com/online CS300: Couchbase NoSQL Server Administration CD220: Developing Couchbase NoSQL Applications CD210: Couchbase NoSQL Data Modeling, Querying, and Tuning Using N1QL CD257: Developing Couchbase Mobile NoSQL Applications

Slide 67

Slide 67 text

©2016 Couchbase Inc. 67 Tyler Mitchell Senior Product Manager, SDK [email protected] @1tylermitchell Clarence J M Tauro, Ph.D. Senior Instructor [email protected] @javapsyche

Slide 68

Slide 68 text

©2016 Couchbase Inc. Thank You! 68