Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modern Radiology for Distributed Systems

Modern Radiology for Distributed Systems

Presented at RICON 2012: http://basho.com/community/ricon2012/

Each of us operates distributed systems. Some of us operate traditional infrastructure with database, web, and load-balancing tiers. Others require infrastructure that is more bespoke and may incorporate non-traditional storage solutions (such as Riak). Regardless of where each of us falls on this spectrum, the network closely describes the behavior of our applications. Furthermore, it is the only place we can look to understand emergent behavior of applications working together in concert. In this talk, we take a radiological view of network-derived imagery and discuss what it can tell us about our systems as a whole.

Dietrich Featherston

October 11, 2012
Tweet

More Decks by Dietrich Featherston

Other Decks in Technology

Transcript

  1. non-invasive monitoring measures taken to describe the state of a

    system with minimal changes to the system being monitored Thursday, October 11, 12
  2. preventative care measures taken to prevent diseases or injuries rather

    than curing them or treating their symptoms Thursday, October 11, 12
  3. Information emitted about nodes in the network n Information emitted

    about edges in the network n² Network size Thursday, October 11, 12
  4. We analyze cell-structure because we can’t envision the whole organism

    We react to disease and injury because we lack preventative care Thursday, October 11, 12
  5. We lack preventative care for applications because our non-invasive monitoring

    techniques are growing less and less meaningful Thursday, October 11, 12
  6. dimensions (11) epoch seconds epoch minutes epoch hours node id

    source ip source port dest ip dest port interface country network/asn measurements (8) egress packets egress octets ingress packets ingress octets retransmits errors app-rtt handshake-rtt Thursday, October 11, 12
  7. Case Study #2 Symptoms: - Latent Riak handoff - Cluster

    throughput bottoming out Thursday, October 11, 12
  8. var put: HttpPut = null try { // ... put

    data } catch { case e: Exception => // ... handle exception } finally { if(put != null) { put.abort() } } Thursday, October 11, 12
  9. var put: HttpPut = null try { // ... put

    data } catch { case e: Exception => // ... handle exception } finally { if(put != null) { put.abort() } } Thursday, October 11, 12
  10. abort public void abort() Description copied from interface: HttpUriRequest Aborts

    execution of the request. Source: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/methods/HttpRequestBase.html#abort() THANKS Thursday, October 11, 12
  11. 129 public void abort() { 130 ClientConnectionRequest localRequest; 131 ConnectionReleaseTrigger

    localTrigger; 132 133 this.abortLock.lock(); 134 try { 135 if (this.aborted) { 136 return; 137 } 138 this.aborted = true; 139 140 localRequest = connRequest; 141 localTrigger = releaseTrigger; 142 } finally { 143 this.abortLock.unlock(); 144 } 145 146 // Trigger the callbacks outside of the lock, to prevent 147 // deadlocks in the scenario where the callbacks have 148 // their own locks that may be used while calling 149 // setReleaseTrigger or setConnectionRequest. 150 if (localRequest != null) { 151 localRequest.abortRequest(); 152 } 153 if (localTrigger != null) { 154 try { 155 localTrigger.abortConnection(); 156 } catch (IOException ex) { 157 // ignore 158 } 159 } 160 } Thursday, October 11, 12
  12. 1895 Wilhelm Röntgen discovers X-Rays First medical use of x-rays

    in human imaging takes place one month later Thursday, October 11, 12
  13. 1895 Wilhelm Röntgen discovers X-Rays First medical use of x-rays

    in human imaging takes place one month later 1905 First English text on chest radiography Thursday, October 11, 12
  14. 1895 Wilhelm Röntgen discovers X-Rays First medical use of x-rays

    in human imaging takes place one month later 1920 1905 First English text on chest radiography Society of Radiographers formed Thursday, October 11, 12
  15. Recognition of radiology as a formal medical discipline was a

    cultural problem, not a technology problem http://www.bshr.org.uk/page13.html Thursday, October 11, 12
  16. If you want to talk to me about the query

    language used to ask questions of the network data we collect at Boundary talk to me after or hit me up on twitter. @d2fn github.com/dietrichf Thursday, October 11, 12
  17. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  18. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  19. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  20. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  21. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  22. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  23. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12
  24. Find 45 minutes of total traffic seen on meters 1,

    2, 226, & 301 starting 18 hours ago broken down by peer ip retain top 10 by the ratio of retransmits to packets get volume_1s_meter_ip [ meter in {1, 2, 226, 301}; epochMillis from -18h for 45m; ] categorize sum(ingress) as ingress, sum(egress) as egress, sum(ingressPackets + egressPackets) as packets, sum(retransmits) as retransmits, mean(appRttUsec/1000) as appRttMs by epochMillis, ip retain top 10 per epochMillis on retransmits/packets Thursday, October 11, 12