Slide 1

Slide 1 text

Billion Records from SQL to Cassandra, lessons learned DuyHai Doan Brice Dutheil

Slide 2

Slide 2 text

#CassandraSummit @doanduyhai @BriceDutheil Who are we ? Brice Dutheil Mockito Java Track Lead @ Devoxx France Independant contractor @ Libon (Orange-Vallée) DuyHai Doan Achilles Cassandra Technical Advocate Former Java Developer @ Libon 2

Slide 3

Slide 3 text

#CassandraSummit @doanduyhai @BriceDutheil Agenda •  Libon context •  Migration strategy •  Business code migration •  Data Modeling •  Take Away 3

Slide 4

Slide 4 text

#CassandraSummit @doanduyhai @BriceDutheil Libon Context

Slide 5

Slide 5 text

#CassandraSummit @doanduyhai @BriceDutheil What is Libon ? •  Messaging app •  VOIP (out) •  Custom voicemail & greetings •  SMS/chat/file transfer •  Contacts matching 5

Slide 6

Slide 6 text

#CassandraSummit @doanduyhai @BriceDutheil Contact Matching 6 Libon User

Slide 7

Slide 7 text

#CassandraSummit @doanduyhai @BriceDutheil Contact Matching 7 Libon User Friend

Slide 8

Slide 8 text

#CassandraSummit @doanduyhai @BriceDutheil Contact Matching 8 Libon User Friend Contact matching

Slide 9

Slide 9 text

#CassandraSummit @doanduyhai @BriceDutheil Contact Matching 9 Libon User Friend Accept link

Slide 10

Slide 10 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  Application grew over the years 10

Slide 11

Slide 11 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  Application grew over the years •  Already using Cassandra to handle events •  messaging / file sharing / SMS / notifications •  Cassandra R/W latencies ≈ 0,4 ms •  server response time under 10 ms 11

Slide 12

Slide 12 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … 12

Slide 13

Slide 13 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) 13

Slide 14

Slide 14 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts 14

Slide 15

Slide 15 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts •  with millions users ‛ billions of contacts to handle 15

Slide 16

Slide 16 text

#CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts •  with millions users ‛ billions of contacts to handle •  query latency unpredictable 16

Slide 17

Slide 17 text

#CassandraSummit @doanduyhai @BriceDutheil 17

Slide 18

Slide 18 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS 18

Slide 19

Slide 19 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS •  indices 19

Slide 20

Slide 20 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS •  indices •  partitioning 20

Slide 21

Slide 21 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model 21

Slide 22

Slide 22 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased 22

Slide 23

Slide 23 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased That worked 23

Slide 24

Slide 24 text

#CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased That worked but … 24

Slide 25

Slide 25 text

#CassandraSummit @doanduyhai @BriceDutheil Back-end application RDBMS Cassandra 25

Slide 26

Slide 26 text

#CassandraSummit @doanduyhai @BriceDutheil Next Challenges •  High Availability (DB failure, site failure …) 26

Slide 27

Slide 27 text

#CassandraSummit @doanduyhai @BriceDutheil Next Challenges •  High Availability (DB failure, site failure …) •  Predictable performance at scale 27

Slide 28

Slide 28 text

#CassandraSummit @doanduyhai @BriceDutheil Next Challenges •  High Availability (DB failure, site failure …) •  Predictable performance at scale •  Going to multi data-centers 28

Slide 29

Slide 29 text

#CassandraSummit @doanduyhai @BriceDutheil Going for Cassandra •  Denormalize (if possible …) 29

Slide 30

Slide 30 text

#CassandraSummit @doanduyhai @BriceDutheil Going for Cassandra •  Denormalize (if possible …) •  Know your business ‛ know your queries 30

Slide 31

Slide 31 text

#CassandraSummit @doanduyhai @BriceDutheil Going for Cassandra •  Denormalize (if possible …) •  Know your business ‛ know your queries •  Linear scaling out 31

Slide 32

Slide 32 text

#CassandraSummit @doanduyhai @BriceDutheil Going for Cassandra •  Denormalize (if possible …) •  Know your business ‛ know your queries •  Linear scaling out •  Consistent performance 32

Slide 33

Slide 33 text

#CassandraSummit @doanduyhai @BriceDutheil Data Migration Strategy

Slide 34

Slide 34 text

#CassandraSummit @doanduyhai @BriceDutheil Objectives •  No downtime 34

Slide 35

Slide 35 text

#CassandraSummit @doanduyhai @BriceDutheil Objectives •  No downtime •  No concurrency corner-cases 35

Slide 36

Slide 36 text

#CassandraSummit @doanduyhai @BriceDutheil Objectives •  No downtime •  No concurrency corner-cases •  Safe rollback possible 36

Slide 37

Slide 37 text

#CassandraSummit @doanduyhai @BriceDutheil Objectives •  No downtime •  No concurrency corner-cases •  Safe rollback possible •  Replay-ability & resume-ability 37

Slide 38

Slide 38 text

#CassandraSummit @doanduyhai @BriceDutheil Strategy •  3 phases 38

Slide 39

Slide 39 text

#CassandraSummit @doanduyhai @BriceDutheil Strategy •  3 phases •  Write contacts to both data stores 39

Slide 40

Slide 40 text

#CassandraSummit @doanduyhai @BriceDutheil Strategy •  3 phases •  Write contacts to both data stores •  Old contacts migration 40

Slide 41

Slide 41 text

#CassandraSummit @doanduyhai @BriceDutheil Strategy •  3 phases •  Write contacts to both data stores •  Old contacts migration •  Switch to Cassandra … •  … and deprecate SQL 41

Slide 42

Slide 42 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 1 Back end server · · · SQL SQL SQL C* C* C* C* C* Write contactUUID 42 contactId … contactUUID 129363 123e4567- e89b-12d3… 834849 contacId(long) + contactUUID

Slide 43

Slide 43 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 1 Back end server · · · SQL SQL SQL C* C* C* C* C* Read 43

Slide 44

Slide 44 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 SQL SQL SQL C* C* C* C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL •  On live production, migrate old contacts 44 Old contacts created before phase 1

Slide 45

Slide 45 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 SQL SQL SQL C* C* C* C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week •  On live production, migrate old contacts 45 Old contacts created before phase 1

Slide 46

Slide 46 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 USING TIMESTAMP now() - 1 week 46

Slide 47

Slide 47 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 •  During data migration … 47

Slide 48

Slide 48 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 •  During data migration … •  … concurrent writes from the migration batch … 48

Slide 49

Slide 49 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 •  During data migration … •  … concurrent writes from the migration batch … •  … and updates from production for the same contact 49

Slide 50

Slide 50 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 contact_uuid name (now -1 week) … name (now) … Johny … Johnny … Insert from batch (to the past) Update from production 50

Slide 51

Slide 51 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 contact_uuid name (now -1 week) … name (now) … Johny … Johnny … Future reads pick the most up-to-date value 51

Slide 52

Slide 52 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 "Write to the Past… to save the Future" Libon – 2014/10/08 52

Slide 53

Slide 53 text

#CassandraSummit @doanduyhai @BriceDutheil Migration Phase 3 Back end server · · · SQL SQL SQL C* C* C* C* C* Write ❌ 53

Slide 54

Slide 54 text

#CassandraSummit @doanduyhai @BriceDutheil Business Code Refactoring

Slide 55

Slide 55 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory •  Written for RDBMS 55

Slide 56

Slide 56 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory •  Written for RDBMS •  Lots of joins (no surprise) 56

Slide 57

Slide 57 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory •  Written for RDBMS •  Lots of joins (no surprise) •  Designed around transactions 57

Slide 58

Slide 58 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory •  Written for RDBMS •  Lots of joins (no surprise) •  Designed around transactions •  Spring @Transactional everywhere 58

Slide 59

Slide 59 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory cont. •  Entities go through Services & Repositories 59 Repositories Services ContactEntity

Slide 60

Slide 60 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory cont. •  Hibernate is auto-magic 60

Slide 61

Slide 61 text

#CassandraSummit @doanduyhai @BriceDutheil Code Inventory cont. •  Hibernate is auto-magic •  lazy loading •  1st level cache •  N+1 select 61 Repositories Services ContactEntity

Slide 62

Slide 62 text

#CassandraSummit @doanduyhai @BriceDutheil Which options ? •  Throw existing code … •  … and re-design from scratch for Cassandra 62

Slide 63

Slide 63 text

#CassandraSummit @doanduyhai @BriceDutheil Which options ? •  Throw existing code … •  … and re-design from scratch for Cassandra No way ! 63

Slide 64

Slide 64 text

#CassandraSummit @doanduyhai @BriceDutheil Code Quality •  Existing business code has… •  … ≈ 3500 unit tests 64

Slide 65

Slide 65 text

#CassandraSummit @doanduyhai @BriceDutheil Code Quality •  Existing business code has… •  … ≈ 3500 unit tests •  and ≈600+ integration tests 65

Slide 66

Slide 66 text

#CassandraSummit @doanduyhai @BriceDutheil Code Quality •  We are TDD aficionados … 66

Slide 67

Slide 67 text

#CassandraSummit @doanduyhai @BriceDutheil Code Quality •  We are TDD aficionados … •  … and we love our code coverage 67

Slide 68

Slide 68 text

#CassandraSummit @doanduyhai @BriceDutheil Code Quality "The code coverage is one of your most valuable technical asset" Libon – since beginning 68

Slide 69

Slide 69 text

#CassandraSummit @doanduyhai @BriceDutheil Repositories Services Refactoring Strategy ContactMatchingService ContactService ContactSync ContactEntity n 1 n n 69

Slide 70

Slide 70 text

#CassandraSummit @doanduyhai @BriceDutheil Repositories Services Refactoring Strategy ContactMatchingService ContactService ContactNoSQLEntity ContactSync ContactEntity n 1 n n 70 Proxy

Slide 71

Slide 71 text

#CassandraSummit @doanduyhai @BriceDutheil Repositories Services Refactoring Strategy ContactMatchingService ContactService ContactNoSQLEntity ContactSync ContactEntity n 1 n n Denorm2 … DenormN Denorm1 71 Proxy

Slide 72

Slide 72 text

#CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  Use CQRS •  ContactReadRepository •  ContactWriteRepository •  ContactUpdateRepository •  ContactDeleteRepository 72

Slide 73

Slide 73 text

#CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  ContactReadRepository •  direct sequential read •  no joins •  1 read ≈ 1 SELECT 73

Slide 74

Slide 74 text

#CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  ContactWriteRepository •  write to all denormalized tables •  using CQL logged batches •  use TTLs 74

Slide 75

Slide 75 text

#CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  ContactUpdateRepository •  read-before-write most of the time •  rare updates ‛ acceptable perf penalty 75

Slide 76

Slide 76 text

#CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  ContactDeleteRepository •  delete •  update contact modification date 76

Slide 77

Slide 77 text

#CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men work 77

Slide 78

Slide 78 text

#CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) 78

Slide 79

Slide 79 text

#CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling 79

Slide 80

Slide 80 text

#CassandraSummit @doanduyhai @BriceDutheil Gatling Output 80

Slide 81

Slide 81 text

#CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling ‛ data model & code validation 81

Slide 82

Slide 82 text

#CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling ‛ data model & code validation •  … we are almost there for production 82

Slide 83

Slide 83 text

#CassandraSummit @doanduyhai @BriceDutheil Data Model

Slide 84

Slide 84 text

#CassandraSummit @doanduyhai @BriceDutheil Denormalization, the good •  Support fast reads •  1 read ≈ 1 SELECT •  Worthy because mostly read, few updates 84

Slide 85

Slide 85 text

#CassandraSummit @doanduyhai @BriceDutheil Denormalization, the bad •  Updating mutable data can be nightmare •  Data model bound by existing client-facing API •  Update paths very error-prone without tests 85

Slide 86

Slide 86 text

#CassandraSummit @doanduyhai @BriceDutheil Data model in detail Contacts_by_id Contacts_by_identifiers Contacts_in_profiles Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user 86

Slide 87

Slide 87 text

#CassandraSummit @doanduyhai @BriceDutheil Data model in detail Contacts_by_id Contacts_by_identifiers Contacts_in_profiles Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user 87 user_id always component of partition key

Slide 88

Slide 88 text

#CassandraSummit @doanduyhai @BriceDutheil Scalable design 88 n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H user_id1 user_id2 user_id3 user_id4 user_id5

Slide 89

Slide 89 text

#CassandraSummit @doanduyhai @BriceDutheil Scalable design 89 n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H user_id1 user_id2 user_id3 user_id4 user_id5

Slide 90

Slide 90 text

#CassandraSummit @doanduyhai @BriceDutheil Bloom filters in action 90 •  For some tables, partition key = (user_id, contact_id) ‛ fast look-up, leverages Bloom filters ‛ touches 1 SSTable most of the time

Slide 91

Slide 91 text

#CassandraSummit @doanduyhai @BriceDutheil Data model in detail Contacts_by_id Contacts_by_identifiers Contacts_in_profiles Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user 91 Wide partition Bucketed

Slide 92

Slide 92 text

#CassandraSummit @doanduyhai @BriceDutheil A "queue" story 92 •  contacts_by_modification_date •  queue-like pattern

Slide 93

Slide 93 text

#CassandraSummit @doanduyhai @BriceDutheil A "queue" story 93 •  contacts_by_modification_date •  queue-like pattern ‛ buckets to the rescue user_id:2014-12 date35 date12 … … date47 … … … … user_id:2014-11 date11 date12 … … date34 … … … …

Slide 94

Slide 94 text

#CassandraSummit @doanduyhai @BriceDutheil Data model summary •  7 tables for denormalization 94

Slide 95

Slide 95 text

#CassandraSummit @doanduyhai @BriceDutheil Data model summary •  7 tables for denormalization •  Normalize some tables because rare access 95

Slide 96

Slide 96 text

#CassandraSummit @doanduyhai @BriceDutheil Data model summary •  7 tables for denormalization •  Normalize some tables because rare access •  Read-before write in most update scenarios 96

Slide 97

Slide 97 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  In SQL, auto-generated long using sequence •  In Cassandra, auto-generated timeuuid 97

Slide 98

Slide 98 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  How to store both types ? 98

Slide 99

Slide 99 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  How to store both types ? •  As text ? ‛ easy solution … 99

Slide 100

Slide 100 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  How to store both types ? •  As text ? ‛ easy solution … •  … but waste of space ! •  because encoded as UTF-8 or ASCII in Cassandra 100

Slide 101

Slide 101 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  Long ‛ 8 bytes •  Long as text(UTF-8: 1 byte) ‛ "digits count" bytes 101

Slide 102

Slide 102 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  UUID ‛ 16 bytes •  32 hex chars + 4 hyphens = 36 chars •  UUID as text(UTF-8: 1 byte) ‛ 36 bytes •  Bytes overhead = 36 – 16 = 20 bytes 102

Slide 103

Slide 103 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  20 bytes wasted per contact uuid 103

Slide 104

Slide 104 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  20 bytes wasted per contact uuid •  × 7 denormalizations = 140 bytes per contact uuid 104

Slide 105

Slide 105 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  20 bytes wasted per contact uuid •  × 7 denormalizations = 140 bytes per contact uuid •  × 109 contacts = 140 GB wasted 105 not even counting replication factor …

Slide 106

Slide 106 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  ‛ just save contact id as byte[ ] 106

Slide 107

Slide 107 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  ‛ just save contact id as byte[ ] •  Achilles @TypeTransformer for automatic conversion (see later) 107

Slide 108

Slide 108 text

#CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  ‛ just save contact id as byte[ ] •  Achilles @TypeTransformer for automatic conversion (see later) •  Use blobAsBigInt( ) or blobAsUUID( ) to view data 108

Slide 109

Slide 109 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Advanced "object mapper" •  Fluent API •  Tons of features •  TDD friendly 109

Slide 110

Slide 110 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it important ? 110

Slide 111

Slide 111 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it important ? •  1 contact ≈ 8 mutable fields 111

Slide 112

Slide 112 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it important ? •  1 contact ≈ 8 mutable fields •  × 7 denormalizations = 56 update combinations … 112

Slide 113

Slide 113 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it important ? •  1 contact ≈ 8 mutable fields •  × 7 denormalizations = 56 update combinations … •  and not even counting multiple fields updates … 113

Slide 114

Slide 114 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Are you going to manually generate 56+ prepared statements for all possible updates ? 114

Slide 115

Slide 115 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Are you going to manually generate 56+ prepared statements for all possible updates ? •  Or just use dynamic plain string statements and get some perf penalty ? 115

Slide 116

Slide 116 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty check in action //No read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…); manager.update(proxy); 116

Slide 117

Slide 117 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles 117 Empty Entity DirtyMap Proxy Setters interception PrimaryKey

Slide 118

Slide 118 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dynamic statements generation UPDATE contacts SET firstname=?, lastname=?,address=? WHERE contact_id=? 118 prepared statements are cached, of course

Slide 119

Slide 119 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Insert strategy, what is it ? 119

Slide 120

Slide 120 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Simple INSERT prepared statement INSERT INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?); 120

Slide 121

Slide 121 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Runtime values binding •  some columns are optional preparedStatement.bind(49374,’John DOE’,33, null, null, …, null); 121

Slide 122

Slide 122 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles Wait … are you saying inserting null in CQL??? 122

Slide 123

Slide 123 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles Inserting null 㲇 creating tombstones 123

Slide 124

Slide 124 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles Inserting null 㲇 creating tombstones × 7 denormalizations 124

Slide 125

Slide 125 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles Inserting null 㲇 creating tombstones × 7 denormalizations × billions of contacts created 125 not even counting replication factor …

Slide 126

Slide 126 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles @Entity(table = "contacts_by_id ») @Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) public class ContactById { } 126 •  Simple annotation

Slide 127

Slide 127 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles •  Runtime dynamic INSERT statement INSERT INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address); 127 prepared statements are cached, of course

Slide 128

Slide 128 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles @PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass = ContactIdToBytes.class) private ContactId contactId; 128 •  Remember the contactId ⁶ byte[ ] conversion ? BYOC ‛ Bring Your Own Codec

Slide 129

Slide 129 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles public interface Codec { Class sourceType(); Class targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); } 129

Slide 130

Slide 130 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles 130 2014-12-01 14:25:20,554 Bound statement : [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...] 2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead] •  Dynamic logging in action

Slide 131

Slide 131 text

#CassandraSummit @doanduyhai @BriceDutheil Achilles 131 •  Dynamic logging •  runtime activation •  no need to recompile/re-deploy •  save us hours of debugging •  TRACE log level ‛ query tracing

Slide 132

Slide 132 text

#CassandraSummit @doanduyhai @BriceDutheil Take Away

Slide 133

Slide 133 text

#CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is crucial 133

Slide 134

Slide 134 text

#CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW 134

Slide 135

Slide 135 text

#CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky 135

Slide 136

Slide 136 text

#CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky •  Benchmark ! 136

Slide 137

Slide 137 text

#CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky •  Benchmark ! •  Mindset shifts for the team 137

Slide 138

Slide 138 text

Thank You