Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Map & Territory: A story of visibility

Map & Territory: A story of visibility

Pierre-Yves Ritschard

April 19, 2013
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Technology

Transcript

  1. Map & Territory
    a story of visibility

    View Slide

  2. Pierre-Yves
    @pyr
    https://github.com/pyr

    View Slide

  3. https://exoscale.ch

    View Slide

  4. Visibility

    View Slide

  5. How do we work ?

    View Slide

  6. View Slide

  7. How do we
    improve?

    View Slide

  8. Avoid Shortcuts!

    View Slide

  9. View Slide

  10. We want lower
    defect rates

    View Slide

  11. We want to make
    informed decisions

    View Slide

  12. Design
    Build
    Live

    View Slide

  13. Visibility

    View Slide

  14. Extracting meaningful
    state data from
    heterogeneous event
    sources, over time

    View Slide

  15. Meaningful
    (relates to business value)

    View Slide

  16. State Data
    (structured payload)

    View Slide

  17. Heterogeneous
    (everyone is involved)

    View Slide

  18. Over time
    (tracking)

    View Slide

  19. How does it help
    my system's
    lifecycle ?

    View Slide

  20. Map
    =/=
    Territory

    View Slide

  21. Break out of our
    mental model

    View Slide

  22. "I'll push this
    minor change, it
    cannot do any
    harm"

    View Slide

  23. View Slide

  24. "I'll just add this
    static route"

    View Slide

  25. View Slide

  26. Better lifecycle
    Informed decisions
    Better maps

    View Slide

  27. Systems are
    (increasingly)
    complex

    View Slide

  28. Web Infrastructure
    circa 00
    (2 servers)

    View Slide

  29. Visibility Circa '00

    View Slide

  30. Web Infrastructure
    circa '12
    (27 nodes)

    View Slide

  31. View Slide

  32. Visibility Circa '12

    View Slide

  33. Q: how is business
    doing today ?
    A:

    View Slide

  34. Q: how is business
    doing today ?
    A: based on these
    key metrics we're
    looking good

    View Slide

  35. Figure out those
    key metrics

    View Slide

  36. We need
    appropriate tooling

    View Slide

  37. events across:
    system,
    components,
    software

    View Slide

  38. The event stream
    approach

    View Slide

  39. Plenty of small
    producers
    Few big consumers

    View Slide

  40. Production:
    Anything that
    happens or
    moves (logs
    too!):
    Normalize &
    Stream

    View Slide

  41. Consumption:
    Aggregate
    Correlate
    Decide

    View Slide

  42. Aggregation
    compute compound
    metrics (ratios, sums)

    View Slide

  43. Correlation

    View Slide

  44. Decision
    track, alert, ignore,
    scale

    View Slide

  45. Implementing
    on premise, saas or in
    between ?

    View Slide

  46. SaaS
    loggly, papertrail,
    librato, datadog, ...

    View Slide

  47. On Premise
    collectd, logstash,
    graphite, statsd,
    riemann

    View Slide

  48. The path to visibility:
    Find key metrics
    Find the right tools
    Rely on an event stream
    Involve everyone
    Challenge your mental model
    Hopefully, improve quality and lower
    defect rates in the process!

    View Slide

  49. Questions ?

    View Slide