Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using logs to create a solid data infrastructure

Using logs to create a solid data infrastructure

Slides from my talk at Craft Conference, Budapest, Hungary on 24 April 2015. http://martin.kleppmann.com/2015/04/24/logs-for-data-infrastructure-at-craft.html

Abstract:

How does your database store data on disk reliably? It uses a log.
How does one database replica synchronise with another replica? It uses a log.
How does a distributed algorithm like Raft achieve consensus? It uses a log.
How does activity data get recorded in a system like Apache Kafka? It uses a log.
How will the data infrastructure of your application remain robust at scale? Guess what…

Logs are everywhere. I’m not talking about plain-text log files (such as syslog or log4j) – I mean an append-only, totally ordered sequence of records. It’s a very simple structure, but it’s also a bit strange at first if you’re used to normal databases. However, once you learn to think in terms of logs, many problems of making large-scale data systems reliable, scalable and maintainable suddenly become much more tractable.

Drawing from the experience of building scalable systems at LinkedIn and other startups, this talk will explore why logs are such a fine idea: making it easier to maintain search indexes and caches, making your applications more scalable and more robust in the face of failures, and opening up your data for richer analysis, while avoiding race conditions, inconsistencies and other ugly problems.

Martin Kleppmann

April 22, 2015
Tweet

More Decks by Martin Kleppmann

Other Decks in Programming

Transcript

  1. View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. 216.58.210.78 - - [27/Feb/2015:17:55:11 +0000] "GET

    /css/typography.css HTTP/1.1” 200 3377 "http://martin.

    kleppmann.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X

    10_9_5) AppleWebKit/537.36 (KHTML, like Gecko)

    Chrome/40.0.2214.115 Safari/537.36"

    View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. View Slide

  44. View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. View Slide

  69. View Slide

  70. View Slide

  71. View Slide

  72. View Slide

  73. View Slide

  74. View Slide

  75. View Slide

  76. View Slide

  77. View Slide

  78. References (fun stuff to read)

    1.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September 2014. http://shop.oreilly.com/product/0636920034339.do

    2.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear in 2015. http://dataintensive.net

    3.  Martin Kleppmann: “Turning the database inside-out with Apache Samza.” 4 March 2015. http://blog.confluent.io/
    2015/03/04/turning-the-database-inside-out-with-apache-samza/

    4.  Pat Helland: “Immutability Changes Everything,” at 7th Biennial Conference on Innovative Data Systems Research (CIDR),
    January 2015. http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

    5.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing
    (SoCC), October 2012. http://www.socc2012.org/s18-das.pdf

    6.  Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, et al.: “Tango: Distributed Data Structures over a Shared Log,” at
    24th ACM Symposium on Operating Systems Principles (SOSP), pages 325–340, November 2013. http://
    research.microsoft.com/pubs/199947/Tango.pdf

    7.  C Mohan, Don Haderle, Bruce G Lindsay, Hamid Pirahesh, and Peter Schwarz: “ARIES: A Transaction Recovery
    Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging,” ACM Transactions on
    Database Systems (TODS), volume 17, number 1, pages 94–162, March 1992.

    8.  Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil: “The Log-Structured Merge-Tree (LSM-Tree),”
    Acta Informatica, volume 33, number 4, pages 351–385, June 1996. http://www.cs.umb.edu/~poneil/lsmtree.pdf

    9.  Heidi Howard, Malte Schwarzkopf, Anil Madhavapeddy, and Jon Crowcroft: “Raft Refloated: Do We Have Consensus?,”
    ACM SIGOPS Operating Systems Review, volume 49, number 1, pages 12–21, January 2015. http://www.cl.cam.ac.uk/
    ~ms705/pub/papers/2015-osr-raft.pdf

    View Slide

  79. View Slide