Upgrade to Pro — share decks privately, control downloads, hide ads and more …

by David O'Neil

by David O'Neil

More Decks by API Strategy & Practice Conference

Other Decks in Technology

Transcript

  1. ‹#› o They paid for it o They want to

    know the money they spent is ok o What are they expecting? Is it managers?
  2. ‹#› o Did anything major break? o Did it last

    long? o Was it ‘serious’ – for whatever value of serious people pick… o Did anybody notice? DevOps?
  3. ‹#› o What do you tell them you do? o

    Did they actually notice – i.e. was anybody using it? o Did you tell them? o If an API breaks in a Sandbox and nobody notices is it still broken? Your Developer Community?
  4. ‹#› o They’re a pain, right? o Did they notice?

    o Who’s ear do they have? o Jane Doe Inc being down for an hour might not get the same reaction as Apple Inc. Customers?
  5. ‹#› o In other words – who and what you

    say changes by audience o And they all have different goals/desires o One-size monitoring satisfies nobody Answer: All of them, sadly
  6. ‹#› The API goes down for an hour due to

    a software update gone wrong A Tale of 2 Audiences o API is down for an hour o Pass rate for week is within SLA – only affected a small set of users o No Problem! Next! o Developers email complaints o Nobody in the dev community has data o The perf board shows all green (within SLA) o Why is nobody listening to us????
  7. ‹#› This API was within SLA… except for Tuesdays… Tuesday

    it sucked but it didn’t affect the SLA “But we’re meeting the SLA…”
  8. ‹#› Client complains of crapy performance… “But! But! Our Server

    is fine!!!!” Your service is too slow… We’ll get right on that Looks fine on our end, it must be them
  9. ‹#› SLA – Server response inside 100ms However…. Actual round

    trip time including network and transpacific ‘hop’ – 1,600ms Server looked fine – customer still right
  10. ‹#› o You complain to your infrastructure vendor that their

    capacity sucks o So they run some stress and load tests and pass o But you’re still getting slow throughput… o Turns out it’s a config issue that only emerges when you’re using it, and it’s related to something outside the API stack “Your capacity sucks!”
  11. ‹#› o Monitor stuff from the perspective of your end

    users and report it honestly o Don’t hide behind APM data – the customer is probably wrong, but that doesn’t help – they might not be able to reach your stuff o Watch out for misleading averages or in-server numbers o Provide the right data for the right audiences On not getting caught out….