Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Connecting the Dots... Tracing of Events within ING with Elasticsearch

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
October 29, 2015

Connecting the Dots... Tracing of Events within ING with Elasticsearch

Coming from a world of isolated departments and applications, ING has now embraced the omnichannel approach. Teams are working closely together, but lacking the visibility needed across systems. This is how ING solves the problem of tracing events in a complex software environment.

Stephane Rouault and Christiaan Douma | Elastic{ON} Tour Amsterdam | October 29, 2015

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

October 29, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Connecting the dots… Christiaan Douma - Dev Engineer, Stephane Rouault

    - Dev Engineer Tracing of events within ING Amsterdam • 29th October 2015
  2. 2 ING Future Why Numbers Elastic search How

  3. ING It’s who we are

  4. Market leaders Benelux Growth markets Commercial Banking Challengers 4 World

    map Over 40 countries 52,000+ employees
  5. Market leaders Benelux Growth markets Commercial Banking Challengers 5 European

    map Full-service bank Very strong European base Ranked 7th Largest bank of NL 150+ DevOps teams
  6. •  Engineer •  Software Developer •  7 years of experience

    in IT •  @ING since: 01-03-2011 •  Engineer •  Jack of all trades •  11 years of experience in IT •  @ING since: 01-01-2011 6 A little bit about us Stephane Rouault Christiaan Douma
  7. 7 ING Future Why Numbers Elastic search How

  8. The Why A little bit of context

  9. 9 Happy Flow A simple illustration of a transaction Frontend

    Middleware Backend Request ------------ ------------ ------------ ------------ Reply ------------ ------------ ------------ ------------
  10. 10 Error occurs But what if something goes wrong? Frontend

    Middleware Backend Request ------------ ------------ ------------ ------------ ERROR: Some error at … What happened ? Where did it happen ? What has been affected?
  11. 11 What usually happens next .. Frontend Middleware Backend Log

    Log Log DevOps teams searching their logs..
  12. •  Finding the owner of the problem •  Access to

    logs is restricted to system owners •  Difficult to link events across systems •  Time Consuming Process 12 Issues
  13. 13 Complex ING Landscape Frontend Middleware Backend

  14. 14 ING Future Why Numbers Elastic search How

  15. The How Solving some issues

  16. 16 Unique Correlation Identifier Frontend Middleware Backend Request: UUID ------------

    ------------ ------------ Link events across systems, extra context info in the request
  17. 17 Using Elasticsearch to combine everything Frontend Middleware Backend Log

    Log Log Log Log Log Log Log Log Log Log Log Elasticsearch
  18. Event: UUID Context Duration Status Event: UUID Context Duration Status

    Event: UUID Context Duration Status 18 Event logging in Elasticsearch : Tracing Frontend Middleware Backend Request: UUID ------------ ------------ ------------ Elasticsearch
  19. 19 Tracing

  20. 20 Component Name Error Description Event Info Tracing

  21. 21 Tracing in kibana

  22. 22 ING Future Why Numbers Elastic search How

  23. Elasticsearch The reason why

  24. 24 Free text search Shared service within ING Data lake

    for operational events Scalability No read/write interference Flexible data model High Available How did Elasticsearch help us?
  25. 25 How is Elasticsearch set-up?

  26. 26 ING Future Why Numbers Elastic search How

  27. The Numbers Who doesn’t love numbers

  28. •  Peak number events per second: ~650 events per second

    •  Avg number events per second: ~300 events per second •  Peak number of events per day: ~24 million events •  Avg number of events per day ~11 million events •  Growth of the number of events: ~40% more than 3 months ago.. •  Growth of the number of operations: ~12% more than 3 months ago.. •  Longest chain of events: >30 •  Number of DevOps teams using Tracing : ~24 teams •  Number of architecture domains in Tracing ~5 Some statistics about Tracing 28
  29. •  Peak number events per second ~2500 events per second

    •  Peak number of events per day ~120 million events •  Index per day takes up to 100-130 Gb •  Retaining 30 days of data (1x replicated) ~4,4 TB •  Number of Elasticsearch queries 30 tps •  Avg responsetime (for most queries) 1-2 secondes •  Number of kibana dashboards: ~740 and growing fast. •  Number of shards: 5 Some statistics about Elasticsearch 29
  30. 30 ING Future Why Numbers Elastic search How

  31. The Future Where we want to go

  32. How it works now (architecture) 32

  33. How it will work (architecture roadmap) 33

  34. •  Multi-source event correlation; combining tracing with: •  System logs

    •  Alert logs •  System metrics •  Incidents logs •  Deployment logs •  Etc.. •  Elasticsearch hook up to a graph database for realtime graphical insight •  Forecasting of (business) usage •  Business chain alerting with watcher or other tool Roadmap 34
  35. How it will work (architecture roadmap) 35

  36. Q&A Time to interface