Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Troubleshooting Apache Cassandra

Troubleshooting Apache Cassandra

by Aaron Morton at Cassandra Summit Tokyo 2017

1cf6896ee8a72af116a172b9e1cd5883?s=128

CassandraCommunityJP

October 13, 2017
Tweet

Transcript

  1. AARON MORTON THE LAST PICKLE TROUBLESHOOTING APACHE CASSANDRA : OBSERVATION

    AND CONTROL
  2. AARON MORTON @AARONMORTON CEO, THE LAST PICKLE APACHE CASSANDRA COMMITTER,

    PMC MEMBER
  3. PROBLEMS LET'S TALK ABOUT

  4. 4 OPTIMISING OR TROUBLESHOOTING 4

  5. 5 OPTIMISE THROUGHPUT. MORE REQUESTS PER SECOND. 5

  6. 6 OPTIMISE LATENCY. FASTER REQUESTS. 6

  7. 7 OPTIMISE COST. LOWER COST PER REQUEST. 7

  8. 8 TROUBLESHOOT AVAILABILITY. REQUESTS NOT STARTING OR NOT COMPLETING. 8

  9. 9 TROUBLESHOOT CONSISTENCY. REQUESTS NOT RETURNING EXPECTED DATA. 9

  10. 10 HOW DO YOU SOLVE PROBLEMS? 10

  11. 11 ANTI METHOD: STREETLIGHT 11

  12. 12 ANTI METHOD: RANDOM CHANGE 12

  13. 13 ANTI METHOD: BLAME SOMEONE ELSE 13

  14. 14 OODA LOOP: OBSERVE, ORIENTATE, DECIDE, ACT. 14

  15. 15 15

  16. OBSERVE

  17. 17 “WHAT CAN I SEE?” 17

  18. 18 A QUICK WORD ABOUT TIME. 18

  19. 19 19

  20. 20 GENERALLY CARE ABOUT MILLISECONDS AND MICROSECONDS. 20

  21. 21 21

  22. 22 FOR HUMANS TIME FLOWS IN ONE DIRECTION… 22

  23. 23 … FOR COMPUTERS IT STOPS, JUMPS BACK, JUMPS FORWARD,

    AND IS DIFFERENT ON EVERY MACHINE. 23
  24. 24 (BACK TO) “WHAT CAN I SEE?” 24

  25. 25 LOGS ARE VERY IMPORTANT, TAKE GOOD CARE OF THEM.

    25
  26. 26 OBSERVING WITH NODETOOL CFSTATS 26

  27. 27 KEYSPACE: TEST_CLUSTER READ COUNT: 294681 READ LATENCY: 0.11929593017534215 MS.

    WRITE COUNT: 0 WRITE LATENCY: NAN MS. PENDING FLUSHES: 0 TABLE: MY_TABLE SSTABLE COUNT: 2 SPACE USED (LIVE): 1669941 SPACE USED (TOTAL): 1669941 SPACE USED BY SNAPSHOTS (TOTAL): 0 OFF HEAP MEMORY USED (TOTAL): 57384 SSTABLE COMPRESSION RATIO: 0.40934065577372514 NUMBER OF KEYS (ESTIMATE): 39683 MEMTABLE CELL COUNT: 0 MEMTABLE DATA SIZE: 0 MEMTABLE OFF HEAP MEMORY USED: 0 MEMTABLE SWITCH COUNT: 0 27 LOCAL READ COUNT: 294681 LOCAL READ LATENCY: 0.114 MS LOCAL WRITE COUNT: 0 LOCAL WRITE LATENCY: NAN MS PENDING FLUSHES: 0 BLOOM FILTER FALSE POSITIVES: 1143 BLOOM FILTER FALSE RATIO: 0.00467 BLOOM FILTER SPACE USED: 51896 BLOOM FILTER OFF HEAP MEMORY USED: 51880 INDEX SUMMARY OFF HEAP MEMORY USED: 5184 COMPRESSION METADATA OFF HEAP MEMORY USED: 320 COMPACTED PARTITION MINIMUM BYTES: 18 COMPACTED PARTITION MAXIMUM BYTES: 72 COMPACTED PARTITION MEAN BYTES: 68 AVERAGE LIVE CELLS PER SLICE (LAST FIVE MINUTES): 1.0 MAXIMUM LIVE CELLS PER SLICE (LAST FIVE MINUTES): 1 AVERAGE TOMBSTONES PER SLICE (LAST FIVE MINUTES): 1.0 MAXIMUM TOMBSTONES PER SLICE (LAST FIVE MINUTES): 1
  28. 28 NODETOOL CFHISTOGRAMS 28

  29. 29 TEST_CLUSTER/MY_TABLE HISTOGRAMS PERCENTILE SSTABLES WRITE LATENCY READ LATENCY PARTITION

    SIZE CELL COUNT (MICROS) (MICROS) (BYTES) 50% 1.00 0.00 105.78 72 2 75% 1.00 0.00 126.93 72 2 95% 1.00 0.00 152.32 72 2 98% 1.00 0.00 182.79 72 2 99% 1.00 0.00 219.34 72 2 MIN 1.00 0.00 9.89 18 0 MAX 1.00 0.00 10090.81 72 2 29
  30. 30 NODETOOL TPSTATS 30

  31. 31 POOL NAME ACTIVE PENDING COMPLETED BLOCKED ALL TIME BLOCKED

    MUTATIONSTAGE 0 0 10462 0 0 READSTAGE 0 0 339046 0 0 REQUESTRESPONSESTAGE 0 0 313949 0 0 READREPAIRSTAGE 0 0 48074 0 0 COUNTERMUTATIONSTAGE 0 0 0 0 0 HINTEDHANDOFF 0 0 30 0 0 MISCSTAGE 0 0 0 0 0 COMPACTIONEXECUTOR 0 0 7544 0 0 MEMTABLERECLAIMMEMORY 0 0 27 0 0 PENDINGRANGECALCULATOR 0 0 18 0 0 GOSSIPSTAGE 0 0 47521 0 0 MIGRATIONSTAGE 0 0 0 0 0 MEMTABLEPOSTFLUSH 0 0 306 0 0 VALIDATIONEXECUTOR 0 0 0 0 0 SAMPLER 0 0 0 0 0 MEMTABLEFLUSHWRITER 0 0 27 0 0 INTERNALRESPONSESTAGE 0 0 0 0 0 ANTIENTROPYSTAGE 0 0 0 0 0 CACHECLEANUPEXECUTOR 0 0 0 0 0 NATIVE-TRANSPORT-REQUESTS 0 0 28108 0 0 RPC-THREAD 204 204 0 0 0 31 MESSAGE TYPE DROPPED READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 0 COUNTER_MUTATION 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0
  32. 32 OBSERVING WITH IOSTATS 32

  33. 33 AVG-CPU: %USER %NICE %SYSTEM %IOWAIT %STEAL %IDLE 4.34 0.78

    1.35 3.86 0.19 89.47 DEVICE: RRQM/S WRQM/S R/S W/S RKB/S WKB/S AVGRQ-SZ AVGQU-SZ AWAIT R_AWAIT W_AWAIT SVCTM %UTIL XVDG 0.07 0.27 1269.19 7.05 6172.58 749.07 10.85 0.21 0.16 0.15 3.30 0.24 30.17 33
  34. 34 OBSERVING WITH NETSTATS 34

  35. 35 ACTIVE INTERNET CONNECTIONS PROTO RECV-Q SEND-Q LOCAL ADDRESS FOREIGN

    ADDRESS (STATE) TCP4 0 0 172.20.10.2.54779 138.213.186.35.B.HTTPS ESTABLISHED TCP4 0 0 172.20.10.2.54778 192.168.1.104.SUN-SR-H SYN_SENT TCP4 0 31 172.20.10.2.54718 XX-FBCDN-SHV-01-.HTTPS CLOSING 35
  36. 36 36

  37. 37 METRICS ARE BETTER! 37

  38. 38 METRICS OVERVIEW WIKI.APACHE.ORG/CASSANDRA/ METRICS 38

  39. 39 39

  40. 40 CLUSTER THROUGHPUT 40

  41. 41 .O.A.C.M.CLIENTREQUEST. WRITE.LATENCY.1MINUTERATE READ.LATENCY.1MINUTERATE 41

  42. 42 LOCAL TABLE THROUGHPUT 42

  43. 43 .O.A.C.M.COLUMNFAMILY. KEYSPACE.TABLE.WRITELATENCY.1MINUTERATE KEYSPACE.TABLE.READLATENCY.1MINUTERATE 43

  44. 44 CLUSTER LATENCY 44

  45. 45 .O.A.C.M.CLIENTREQUEST. WRITE.LATENCY.95PERCENTILE READ.LATENCY.95PERCENTILE 45

  46. 46 LOCAL TABLE LATENCY 46

  47. 47 .O.A.C.M.COLUMNFAMILY. KEYSPACE.TABLE.WRITELATENCY.95PERCENTILE KEYSPACE.TABLE.READLATENCY.95PERCENTILE 47

  48. 48 READ PATH 48

  49. 49 .O.A.C.M.COLUMNFAMILY.KEYSPACE.TABLE. LIVESCANNEDHISTOGRAM.95PERCENTILE TOMBSTONESCANNEDHISTOGRAM.95PERCENTILE SSTABLESPERREADHISTOGRAM.95PERCENTILE 49

  50. 50 YAY: THERE IS A LOT OF METRICS. BOO: THERE

    IS A LOT OF METRICS. 50
  51. 51 TLP DASHBOARDS FOR DATA DOG AND GRAPHANA. 51

  52. TLP - OVERVIEW

  53. TLP - WRITE PATH

  54. TLP - READ PATH

  55. TLP - SSTABLE MANAGEMENT

  56. 56 SEE DATA DOG AND TLP WEBSITE OVER THE NEXT

    WEEK TO GET ACCESS. 56
  57. ORIENT

  58. 58 “BECAUSE OF ‘X’ I THINK ‘Y’ IS HAPPENING” 58

  59. 59 59

  60. 60 A QUICK WORD ABOUT WHAT CASSANDRA DOES. 60

  61. 61 SEND BYTES FROM THE CLIENT… 61

  62. 62 …TO THE CO-ORDINATOR… 62

  63. 63 … COPY BYTES, SEND THEM TO REPLICAS… 63

  64. 64 … WRITE BYTES FROM MEMORY TO DISK. 64

  65. 65 “COPY BYTES FROM MEMORY ON ONE MACHINE TO ANOTHER

    AND WRITE TO DISK.” 65
  66. 66 AND THE REVERSE FOR READS. 66

  67. 67 67 CLIENTS API DYNAMO DATABASE DISK CLIENTS API DYNAMO

    DATABASE DISK Node 1 Node 2
  68. 68 (BACK TO) “BECAUSE OF ‘X’ I THINK ‘Y’ IS

    HAPPENING” 68
  69. 69 REVIEW SUB SYSTEMS… 69

  70. 70 WHAT ARE THE CLIENTS SEEING? 70

  71. 71 WHAT IS THE CLUSTER LEVEL THROUGHPUT, LATENCY, ERRORS ETC?

    71
  72. 72 WHAT VIEW OF THE CLUSTER DO THE NODES HAVE?

    72
  73. 73 HOW IS THE STORAGE ENGINE PERFORMING? 73

  74. 74 WHAT IS THE OS AND HARDWARE TELLING CASSANDRA? 74

  75. 75 HOW IS THE NETWORK BEHAVING? 75

  76. 76 WHAT DECISIONS IS THE CO- ORDINATOR TAKING FOR WRITE

    & READ RESULT? 76
  77. 77 (AND ALWAYS) IS THE JVM BEHAVING? 77

  78. 78 IS COMPACTION KEEPING UP? 78

  79. 79 IS REPAIR RUNNING? 79

  80. 80 IS HINTED HANDOFF OR READ REPAIR RUNNING? 80

  81. 81 WE CAN ASK LOTS OF QUESTIONS, BUT WE NEED

    SOME STRUCTURE. 81
  82. 82 WORK FROM THE CLIENT TO THE DISK. 82

  83. 83 BUILD A MODEL OF WHAT EACH SYSTEM IS DOING

    BASED ON EVIDENCE… 83
  84. 84 …USE EVIDENCE TO EXPLAIN WHY, DO NOT GUESS. 84

  85. DECIDE

  86. 86 “BECAUSE ‘Y’ IS HAPPENING WE SHOULD…” 86

  87. 87 A QUICK WORD ABOUT HOW CASSANDRA DOES WHAT IT

    DOES. 87
  88. 88 CASSANDRA MUST FOLLOW THE CODE. 88

  89. 89 OBSERVING CONFIGURATION, METRICS, LOGS, AND TOOLING ALLOWS YOU TO

    UNDERSTAND WHAT CODE IS DOING… 89
  90. 90 AND PREDICT WHAT WILL HAPPEN WHEN THEY CHANGE. 90

  91. 91 THE BEST GUIDE TO UNDERSTANDING WHAT IS HAPPENING IS

    THE CODE. 91
  92. 92 92

  93. 93 (BACK TO) “BECAUSE ‘Y’ IS HAPPENING WE SHOULD…” 93

  94. 94 RETURN TO OBSERVATION OR TAKE ACTION ‘Z’? 94

  95. 95 MAKE MORE OBSERVATIONS WHEN UNDERSTANDING IS UNCLEAR. 95

  96. 96 TAKE ACTIONS TO VALIDATE MENTAL MODEL OR FIX PROBLEM.

    96
  97. ACT

  98. 98 “BECAUSE OF ‘X’ I THINK ‘Y’ IS HAPPENING AND

    SO WILL DO ‘Z’.” 98
  99. 99 TEST ASSUMPTIONS BEFORE MAKING CHANGES. 99

  100. 100 DOCUMENT STATE OF CLUSTER BEFORE STARTING. 100

  101. 101 RECORD WHEN THE CHANGES WERE MADE. 101

  102. 102 TEST PREDICTIONS AFTER ACTIONS. LOOP IF NEEDED. 102

  103. LOOP

  104. 104 104

  105. 105 CHECK PREDICTIONS MADE WHEN DECIDING ON THE ACTION. 105

  106. 106 RETURN TO OBSERVATIONS AND RESTART LOOP 106

  107. 107 MY OODA LOOP… 107

  108. 108 1. START WITH GUESS, THEN BISECT THE PROBLEM. 108

  109. 109 2. BUILD A MENTAL MODEL THAT HAS TESTABLE PREDICTIONS

    BASED ON OBSERVATIONS. 109
  110. 110 3. LOOK FOR EVIDENCE OF PREDICTIONS. 110

  111. 111 4. MAKE CHANGES BASED ON EVIDENCE (OR LACK OFF).

    111
  112. 112 5. GOTO 1. 112

  113. QUESTIONS?