Pro Yearly is on sale from $80 to $50! »

Keep your sanity with compositional tracing

Keep your sanity with compositional tracing

Logging is not enough for building distributed systems that you can investigate in case of production issues (which will happen). As an alternative, we can use tracing.

We’ll learn why logging is not the adequate tool for finding issues in distributed systems and how tracing solves its problems.

Later, we’ll see how it can be implemented, starting from the simplest possible solution, progressing through a clean, compositional solution that doesn’t clutter the API with details of tracing, and works with HTTP requests as well as asynchronous message handling.

08f642741fba006656cb86fb61c160b3?s=128

Jakub Kozłowski

March 12, 2020
Tweet

Transcript

  1. Keep your sanity with compositional tracing Photo by Esther Wilhelmsson

    on Unsplash Jakub Kozłowski |> Typelevel Summit |> New York City |> 12.03.2020 speakerdeck.com/kubukoz
  2. Standard ways of inspecting live systems Inspecting distributed systems Implementation

    hackery! In this talk
  3. type Result = Int val processPayment: String => Result =

    { ... } val result = processPayment(paymentId) Scenario: process payment with an ID
  4. logger.info(s"Started processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing

    payment with result $result") <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] %message%n%xException</pattern> Let's log things!
  5. logger.info(s"Started processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing

    payment with result $result") 2020-03-07 19:21:39.817 [INFO] Started processing payment 3 2020-03-07 19:21:39.817 [INFO] Started processing payment 2 2020-03-07 19:21:39.817 [INFO] Started processing payment 1 2020-03-07 19:21:39.820 [INFO] Finished processing payment with result 10 2020-03-07 19:21:39.820 [INFO] Finished processing payment with result 85 2020-03-07 19:21:39.820 [INFO] Finished processing payment with result 46 <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] %message%n%xException</pattern> Let's log things!
  6. logger.info(s"Started processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing

    payment with result $result") 2020-03-07 19:21:39.817 [INFO] Started processing payment 3 2020-03-07 19:21:39.817 [INFO] Started processing payment 2 2020-03-07 19:21:39.817 [INFO] Started processing payment 1 2020-03-07 19:21:39.820 [INFO] Finished processing payment with result 10 2020-03-07 19:21:39.820 [INFO] Finished processing payment with result 85 2020-03-07 19:21:39.820 [INFO] Finished processing payment with result 46 Which result came from which paymentId? <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] %message%n%xException</pattern> Let's log things!
  7. <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] %message%n%xException</pattern> <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] [%thread] %message%n%xException</pattern> logger.info(s"Started

    processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing payment with result $result") Let's add the executing thread!
  8. <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] %message%n%xException</pattern> <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] [%thread] %message%n%xException</pattern> logger.info(s"Started

    processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing payment with result $result") 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-17] Started processing payment 2 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-19] Started processing payment 3 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-15] Started processing payment 1 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-15] Finished processing payment with result 81 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-19] Finished processing payment with result 52 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-17] Finished processing payment with result 60 Let's add the executing thread!
  9. <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] %message%n%xException</pattern> <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] [%thread] %message%n%xException</pattern> logger.info(s"Started

    processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing payment with result $result") 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-17] Started processing payment 2 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-19] Started processing payment 3 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-15] Started processing payment 1 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-15] Finished processing payment with result 81 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-19] Finished processing payment with result 52 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-17] Finished processing payment with result 60 Let's add the executing thread!
  10. None
  11. None
  12. 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-17] Started processing payment 2 2020-03-07 19:26:38.490

    [INFO] [scala-execution-context-global-19] Started processing payment 3 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-15] Started processing payment 1 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-15] Finished processing payment with result 81 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-19] Finished processing payment with result 52 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-17] Finished processing payment with result 60 ... ... ... 2020-03-07 19:27:38.493 [INFO] [scala-execution-context-global-19] Oh, one more thing Is this still the same request? When is the request over?
  13. 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-17] Started processing payment 2 2020-03-07 19:26:38.490

    [INFO] [scala-execution-context-global-19] Started processing payment 3 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-15] Started processing payment 1 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-15] Finished processing payment 3 with result 81 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-19] Finished processing payment 2 with result 52 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-17] Finished processing payment 1 with result 60 So... logging payment ID in each message?
  14. 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-17] Started processing payment 2 2020-03-07 19:26:38.490

    [INFO] [scala-execution-context-global-19] Started processing payment 3 2020-03-07 19:26:38.490 [INFO] [scala-execution-context-global-15] Started processing payment 1 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-15] Finished processing payment 3 with result 81 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-19] Finished processing payment 2 with result 52 2020-03-07 19:26:38.493 [INFO] [scala-execution-context-global-17] Finished processing payment 1 with result 60 So... logging payment ID in each message? Have fun grepping!
  15. package org.slf4j //rough Scala translation object MDC { def put(key:

    String, value: String): Unit def putCloseable(key: String, value: String): Closeable def remove(key: String): Unit def getCopyOfContextMap(): Map[String, String] def clear(): Unit //... } Mapped Diagnostic Context (MDC)
  16. val processPayment: String => Result = { ... } val

    requestId = UUID.randomUUID().toString() Using.resource(MDC.putCloseable("RequestId", requestId)) { _ => logger.info(s"Started processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing payment with result $result") } <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] [%mdc{RequestId:--}] %message%n%xException</pattern> Logging with MDC
  17. val processPayment: String => Result = { ... } val

    requestId = UUID.randomUUID().toString() Using.resource(MDC.putCloseable("RequestId", requestId)) { _ => logger.info(s"Started processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing payment with result $result") } <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] [%mdc{RequestId:--}] %message%n%xException</pattern> 2020-03-07 20:24:26.236 [INFO] [9a7ef5de-eac2-413a-903b-8516ff956f8c] Started processing payment 3 2020-03-07 20:24:26.236 [INFO] [29e707e5-5186-4e31-9d6b-9f73e96d4198] Started processing payment 1 2020-03-07 20:24:26.236 [INFO] [c2db42d8-8a1b-4e9d-b826-bf7fd680058a] Started processing payment 2 2020-03-07 20:24:26.238 [INFO] [29e707e5-5186-4e31-9d6b-9f73e96d4198] Finished processing payment with result 25 2020-03-07 20:24:26.238 [INFO] [c2db42d8-8a1b-4e9d-b826-bf7fd680058a] Finished processing payment with result 76 2020-03-07 20:24:26.238 [INFO] [9a7ef5de-eac2-413a-903b-8516ff956f8c] Finished processing payment with result 88 Logging with MDC
  18. val processPayment: String => Result = { ... } val

    requestId = UUID.randomUUID().toString() Using.resource(MDC.putCloseable("RequestId", requestId)) { _ => logger.info(s"Started processing payment $paymentId") val result = processPayment(paymentId) logger.info(s"Finished processing payment with result $result") } <pattern>%date{YYYY-MM-dd HH:mm:ss.SSS} [%level] [%mdc{RequestId:--}] %message%n%xException</pattern> 2020-03-07 20:24:26.236 [INFO] [9a7ef5de-eac2-413a-903b-8516ff956f8c] Started processing payment 3 2020-03-07 20:24:26.236 [INFO] [29e707e5-5186-4e31-9d6b-9f73e96d4198] Started processing payment 1 2020-03-07 20:24:26.236 [INFO] [c2db42d8-8a1b-4e9d-b826-bf7fd680058a] Started processing payment 2 2020-03-07 20:24:26.238 [INFO] [29e707e5-5186-4e31-9d6b-9f73e96d4198] Finished processing payment with result 25 2020-03-07 20:24:26.238 [INFO] [c2db42d8-8a1b-4e9d-b826-bf7fd680058a] Finished processing payment with result 76 2020-03-07 20:24:26.238 [INFO] [9a7ef5de-eac2-413a-903b-8516ff956f8c] Finished processing payment with result 88 last log with request id - end of request Logging with MDC
  19. val processPayment: String => Future[Result] = { ... } Using.resource(MDC.putCloseable("RequestId",

    requestId)) { _ => logger.info(s"Started processing payment $paymentId") processPayment(paymentId).map { result => logger.info(s"Finished processing payment with result $result") } } What if the thread changes?
  20. val processPayment: String => Future[Result] = { ... } Using.resource(MDC.putCloseable("RequestId",

    requestId)) { _ => logger.info(s"Started processing payment $paymentId") processPayment(paymentId).map { result => logger.info(s"Finished processing payment with result $result") } } 2020-03-07 20:44:10.781 [INFO] [96a4e310-1f2d-4670-80c8-eb28c0561e4a] Started processing payment 2 2020-03-07 20:44:10.781 [INFO] [1afb3aa6-c7ec-4c56-b9a8-c5ccf2c13cf3] Started processing payment 1 2020-03-07 20:44:10.781 [INFO] [d857fe1d-48cf-4f58-b369-8705b31f9b8d] Started processing payment 3 2020-03-07 20:44:10.790 [INFO] [-] Finished processing payment with result 46 2020-03-07 20:44:10.790 [INFO] [-] Finished processing payment with result 89 2020-03-07 20:44:10.790 [INFO] [-] Finished processing payment with result 73 What happened? What if the thread changes?
  21. Async programming on the JVM doThing { result => doOtherThing(result)

    { secondResult => doMoreThings(result, secondResult) } }
  22. Async programming in Scala doThing.flatMap { result => doOtherThing(result).flatMap {

    secondResult => doMoreThings(result, secondResult) } }
  23. Async programming in Scala for { result <- doThing secondResult

    <- doOtherThing(result) out <- doMoreThings(result, secondResult) } yield out
  24. What happens in map / flatMap ? Future .successful(42) .map(f)

    .map(g)
  25. What happens in map / flatMap ? Future .successful(42) .map(f)(ec1)

    .map(g)(ec2)
  26. What happens in map / flatMap ? Future .successful(42) .map(f)(ec1)

    .map(g)(ec2) ec1.execute { () => val x = f(42) ec2.execute { val y = g(x) ... } }
  27. Where is context lost? ec1.execute { () => val x

    = f(42) ... }
  28. Where is context lost? ec1.execute { () => val x

    = f(42) ... }
  29. Propagating context between submission to EC and execution def runWithContext[A](ctx:

    Map[String, String])(f: => A): A = { val oldContext = getContext() setContext(ctx) try f finally setContext(oldContext) }
  30. Propagating context between submission to EC and execution // Simplified

    definitions type Runnable = () => Unit type ExecutionContext = Runnable => Unit
  31. Propagating context between submission to EC and execution // Simplified

    definitions type Runnable = () => Unit type ExecutionContext = Runnable => Unit object mdcWithContext { def getContext(): Map[String, String] = MDC.getCopyOfContextMap().asScala.toMap def setContext(ctx: Map[String, String]): Unit = MDC.setContextMap(ctx.asJava) }
  32. Propagating context between submission to EC and execution // Simplified

    definitions type Runnable = () => Unit type ExecutionContext = Runnable => Unit object mdcWithContext { def getContext(): Map[String, String] = MDC.getCopyOfContextMap().asScala.toMap def setContext(ctx: Map[String, String]): Unit = MDC.setContextMap(ctx.asJava) } def mdcAware(ec: ExecutionContext): ExecutionContext = runnable => { val rootContext = mdcWithContext.getContext() ec(() => mdcWithContext.runWithContext(rootContext)(runnable())) }
  33. Mapping on an instrumented ExecutionContext val rootContext = getContext() ec1.execute

    { () => val storedContext1 = getContext() setContext(rootContext) try { val x = f(42) //... } finally { setContext(storedContext1) } } ec1.execute { () => val x = f(42) ... }
  34. Mapping on an instrumented ExecutionContext ec1.execute { () => val

    x = f(42) ... } val rootContext = getContext() ec1.execute { () => val storedContext1 = getContext() setContext(rootContext) try { val x = f(42) //... } finally { setContext(storedContext1) } }
  35. Mapping on an instrumented ExecutionContext val rootContext = getContext() ec1.execute

    { () => val storedContext1 = getContext() setContext(rootContext) try { val x = f(42) //... } finally { setContext(storedContext1) } }
  36. implicit val ec = mdcAware(ExecutionContext.global) Keeping MDC on task submission

    to ExecutionContext
  37. implicit val ec = mdcAware(ExecutionContext.global) 2020-03-07 20:47:23.779 [INFO] [85c52920-4271-40f0-a150-d34bf4f8b5ec] Started

    processing payment 2 2020-03-07 20:47:23.779 [INFO] [78e9ccbc-98cb-440a-848c-d72f917857dd] Started processing payment 1 2020-03-07 20:47:23.779 [INFO] [d57bf741-51fc-45e6-a6ca-58f6488c9d05] Started processing payment 3 2020-03-07 20:47:23.785 [INFO] [d57bf741-51fc-45e6-a6ca-58f6488c9d05] Finished processing payment with result 96 2020-03-07 20:47:23.787 [INFO] [78e9ccbc-98cb-440a-848c-d72f917857dd] Finished processing payment with result 77 2020-03-07 20:47:23.785 [INFO] [85c52920-4271-40f0-a150-d34bf4f8b5ec] Finished processing payment with result 55 Keeping MDC on task submission to ExecutionContext
  38. Sure! implicit val contextShift: ContextShift[IO] = IO.contextShift(ec) Does this work

    with IO?
  39. Sure! implicit val contextShift: ContextShift[IO] = IO.contextShift(ec) Does this work

    with IO?
  40. val processPayment: String => IO[Result] = { ... } def

    executeRequest(paymentId: String) = { val requestId = ju.UUID.randomUUID().toString() Using.resource(MDC.putCloseable("RequestId", requestId)) { _ => logger.info(s"Started processing payment $paymentId") processPayment(paymentId).map { result => logger.info(s"Finished processing payment with result $result") } } } Let's try it
  41. val processPayment: String => IO[Result] = { ... } def

    executeRequest(paymentId: String) = { val requestId = ju.UUID.randomUUID().toString() Using.resource(MDC.putCloseable("RequestId", requestId)) { _ => logger.info(s"Started processing payment $paymentId") processPayment(paymentId).map { result => logger.info(s"Finished processing payment with result $result") } } } 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 2 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 4 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 3 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 1 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 52 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 19 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 93 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 97 Let's try it
  42. val processPayment: String => IO[Result] = { ... } def

    executeRequest(paymentId: String) = { val requestId = ju.UUID.randomUUID().toString() Using.resource(MDC.putCloseable("RequestId", requestId)) { _ => logger.info(s"Started processing payment $paymentId") processPayment(paymentId).map { result => logger.info(s"Finished processing payment with result $result") } } } 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 2 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 4 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 3 2020-03-08 16:20:47.088 [INFO] [-] Started processing payment 1 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 52 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 19 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 93 2020-03-08 16:20:47.216 [INFO] [-] Finished processing payment with result 97 Let's try it Lazy evaluation!
  43. Little lesson in hackery def getContext(): Map[String, String] def setContext(ctx:

    Map[String, String]): Unit def runWithContext[A]( ctx: Map[String, String] )(f: => A): A = { val rootContext = getContext() setContext(ctx) try f finally setContext(rootContext) }
  44. val getContext: IO[Map[String, String]] def setContext(ctx: Map[String, String]): IO[Unit] def

    withContext[A]( ctx: Map[String, String] )(f: IO[A]): IO[A] = getContext.flatMap { rootContext => setContext(ctx) .bracket(_ => f)( _ => setContext(rootContext) ) } Impure Little lesson in hackery def getContext(): Map[String, String] def setContext(ctx: Map[String, String]): Unit def runWithContext[A]( ctx: Map[String, String] )(f: => A): A = { val rootContext = getContext() setContext(ctx) try f finally setContext(rootContext) } Pure
  45. val getContext: IO[Map[String, String]] = IO(MDC.getCopyOfContextMap().asScala.toMap) def setContext(ctx: Map[String, String]):

    IO[Unit] = ???
  46. val getContext: IO[Map[String, String]] = IO(MDC.getCopyOfContextMap().asScala.toMap) def setContext(ctx: Map[String, String]):

    IO[Unit] = IO(MDC.setContextMap(ctx.asJava)) That'll work... for MDC Hack #1 - just set the context
  47. Why does this work? withContext(newContext)( fa = launchMissiles ) def

    setContext(ctx: Map[String, String]): IO[Unit] = IO(MDC.setContextMap(ctx.asJava))
  48. Why does this work? withContext(newContext)( fa = IO.shift(mdcAware(ec)) *> launchMissiles

    ) def setContext(ctx: Map[String, String]): IO[Unit] = IO(MDC.setContextMap(ctx.asJava))
  49. def setContext(ctx: Map[String, String]): IO[Unit] = IO.async[Unit] { cb =>

    runWithContext(ctx) { cb(Right(())) } } Hack #2 - use runWithContext and async
  50. def setContext(ctx: Map[String, String]): IO[Unit] = IO.async[Unit] { cb =>

    runWithContext(ctx) { cb(Right(())) } } This will work* for APIs that don't directly provide setContext Hack #2 - use runWithContext and async * - with one more change we'll see in a minute
  51. Why does THAT work? - IO.async 101 object IO {

    def async[A](k: (Either[Throwable, A] => Unit) => Unit): IO[A] } How do Fibers Work? A Peek Under the Hood https://youtu.be/x5_MmZVLiSM
  52. Why does THAT work? - IO.async 101 object IO {

    def async[A](k: (Either[Throwable, A] => Unit) => Unit): IO[A] } cb How do Fibers Work? A Peek Under the Hood https://youtu.be/x5_MmZVLiSM
  53. Why does THAT work? - IO.async 101 object IO {

    def async[A](k: (Either[Throwable, A] => Unit) => Unit): IO[A] } cb val sleep5s = IO.async[Int] { cb: (Either[Throwable, Int] => Unit) => scheduler.schedule(cb(Right(42)), 5.seconds) } val prog = sleep5s.flatMap(i => IO(sendMissiles(count = i))) How do Fibers Work? A Peek Under the Hood https://youtu.be/x5_MmZVLiSM
  54. Why does THAT work? - IO.async 101 val sleep5s =

    IO.async[Int] { cb: (Either[Throwable, Int] => Unit) => scheduler.schedule(cb(Right(42)), 5.seconds) } val prog = sleep5s.flatMap(i => IO(sendMissiles(count = i))) Mental model How do Fibers Work? A Peek Under the Hood https://youtu.be/x5_MmZVLiSM
  55. Why does THAT work? - IO.async 101 val sleep5s =

    IO.async[Int] { cb: (Either[Throwable, Int] => Unit) => scheduler.schedule(cb(Right(42)), 5.seconds) } val prog = sleep5s.flatMap(i => IO(sendMissiles(count = i))) val cb = { case Right(i: Int) => sendMissiles(count = i) case Left(e) => throw e } Mental model How do Fibers Work? A Peek Under the Hood https://youtu.be/x5_MmZVLiSM
  56. Why does THAT work? - IO.async 101 val sleep5s =

    IO.async[Int] { cb: (Either[Throwable, Int] => Unit) => scheduler.schedule(cb(Right(42)), 5.seconds) } val prog = sleep5s.flatMap(i => IO(sendMissiles(count = i))) val cb = { case Right(i: Int) => sendMissiles(count = i) case Left(e) => throw e } Mental model val prog = scheduler.schedule(sendMissiles(42), ...) => How do Fibers Work? A Peek Under the Hood https://youtu.be/x5_MmZVLiSM
  57. Why does this work? withContext(newContext)( fa = launchMissiles ) *>

    remainder def setContext(ctx: Map[String, String]): IO[Unit] = IO.async[Unit] { cb => runWithContext(ctx) { cb(Right(())) } }
  58. Why does this work? withContext(newContext)( fa = IO.shift(mdcAware(ec)) *> launchMissiles

    ) *> remainder def setContext(ctx: Map[String, String]): IO[Unit] = IO.async[Unit] { cb => runWithContext(ctx) { cb(Right(())) } }
  59. But wait, there's more! def setContext(ctx: Map[String, String]): IO[Unit] =

    IO.shift *> IO.async[Unit] { cb => runWithContext(ctx) { cb(Right(())) } }
  60. (Cats) IO trampolines callbacks for stack safety def showAsync(tag: String):

    IO[Unit] = IO.async[Unit] { cb => println(s"before $tag") cb(Right(())) println(s"after $tag") } showAsync("cb1") *> showAsync("cb2") *> IO(println("foo"))
  61. (Cats) IO trampolines callbacks for stack safety def showAsync(tag: String):

    IO[Unit] = IO.async[Unit] { cb => println(s"before $tag") cb(Right(())) println(s"after $tag") } showAsync("cb1") *> showAsync("cb2") *> IO(println("foo")) before cb1 before cb2 after cb2 foo after cb1
  62. Shifting before async makes sure running cb evaluates the continuation

    immediately def showAsync(tag: String): IO[Unit] = IO.shift *> IO.async[Unit] { cb => println(s"before $tag") cb(Right(())) println(s"after $tag") } showAsync("cb1") *> showAsync("cb2") *> IO(println("foo")) before cb1 after cb1 before cb2 foo after cb2
  63. Which one should I use? The idea is still pretty

    fresh, but right now:
  64. Which one should I use? The idea is still pretty

    fresh, but right now: val getContext: IO[Map[String, String]] = IO(MDC.getCopyOfContextMap().asScala.toMap) def setContext(ctx: Map[String, String]): IO[Unit] = IO(MDC.setContextMap(ctx.asJava)) def withContext[A]( ctx: Map[String, String] )(f: IO[A]): IO[A] = getContext.flatMap { rootContext => setContext(ctx) .bracket(_ => f)( _ => setContext(rootContext) ) } Hack #1 (for simplicity)
  65. None
  66. None
  67. Are we done? Let's wait till 3:00 am

  68. Are we done? Let's wait till 3:00 am

  69. Are we done? Let's wait till 3:00 am

  70. Who called this application? Are we done? Let's wait till

    3:00 am
  71. Who called this application? What applications did it call? Which

    URLs? Are we done? Let's wait till 3:00 am
  72. Who called this application? What applications did it call? Which

    URLs? What were the response times? Are we done? Let's wait till 3:00 am
  73. Who called this application? What applications did it call? Which

    URLs? What were the response times? How long did the whole call take? Are we done? Let's wait till 3:00 am
  74. Who called this application? What applications did it call? Which

    URLs? What were the response times? How long did the whole call take? Were there any (caught) errors? Are we done? Let's wait till 3:00 am
  75. Logs are linear 19:43:59.023 [INFO] Started processing payment 1 19:43:59.026

    [INFO] Calling database 19:43:59.137 [INFO] Got result from database 19:43:59.154 [INFO] Calling remote service 19:43:59.155 [INFO] Reading file 19:43:59.208 [INFO] Called remote service 19:43:59.360 [INFO] Read file def executeRequest(paymentId: String) = for { _ <- logger.info(s"Started processing payment $paymentId") _ <- logger.info("Calling database") _ <- callDatabase _ <- logger.info("Got result from database") result <- ( logger.info("Calling remote service") *> callRemoteService <* logger.info("Called remote service"), logger.info("Reading file") *> readFile <* logger.info("Read file") ).parMapN(combineResults) } yield result time Have fun subtracting timestamps!
  76. What we need def executeRequest(paymentId: String) = span("process") { for

    { _ <- logger.info(s"Processing payment $paymentId") _ <- span("db")(callDatabase) result <- ( span("remote")(callRemoteService), span("file")(readFile) ).parMapN(combineResults) } yield result }
  77. What we need def executeRequest(paymentId: String) = span("process") { for

    { _ <- logger.info(s"Processing payment $paymentId") _ <- span("db")(callDatabase) result <- ( span("remote")(callRemoteService), span("file")(readFile) ).parMapN(combineResults) } yield result } - immediately see order of execution - concurrency becomes obvious - inspect details on demand (expand span)
  78. "...a form of structured logging that assumes duration and hierarchy"

    Source: https://tersesystems.com/blog/2020/03/10/a-taxonomy-of-logging/ Distributed tracing
  79. "...a form of structured logging that assumes duration and hierarchy"

    Source: https://tersesystems.com/blog/2020/03/10/a-taxonomy-of-logging/ Distributed tracing Trace
  80. "...a form of structured logging that assumes duration and hierarchy"

    Source: https://tersesystems.com/blog/2020/03/10/a-taxonomy-of-logging/ Distributed tracing Trace Span
  81. Distributed tracing solutions

  82. Case study - log before - async call to database

    (simulated by sleep) - async call to remote service - log afterwards new BusinessLogic[F] { def execute(args: Args): F[Result] = for { _ <- Logger[F].info(show"Executing request $args") _ <- Timer[F].sleep(100.millis) _ <- client.successful(POST(uri"http://localhost:8080/execute")) _ <- Logger[F].info(show"Executed request $args") } yield Result(show"${args.message} finished") } Reference implementation (no tracing) Check out the source code for details (repo linked in final slide)
  83. Create our own model of a span for simplicity Keep

    track via parameter passing Provide to logger/client which will set info as MDC Convert to zipkin model and report manually after each span Reporting traces - direct, no-magic approach
  84. Simple span model final case class Span( name: String, traceId:

    String, spanId: String, parentSpanId: Option[String], values: Map[String, String] )
  85. Reporting traces - direct, no-magic approach def databaseCall(rootSpan: Span) =

    newSpan( "db-call", rootSpan, Map("db.query" -> "select * from users where id = ?") ) { span => logger.info(span.toMap)("Running db call") *> IO.sleep(100.millis) }
  86. Reporting traces - direct, no-magic approach def databaseCall(rootSpan: Span) =

    newSpan( "db-call", rootSpan, Map("db.query" -> "select * from users where id = ?") ) { span => logger.info(span.toMap)("Running db call") *> IO.sleep(100.millis) } def clientCall(rootSpan: Span) = newSpan("remote-call", rootSpan) { child => client.successful( POST( uri"http://localhost:8080/execute", child.toTraceHeaders.toList: _* ) ) }
  87. Reporting traces - direct, no-magic approach def databaseCall(rootSpan: Span) =

    newSpan( "db-call", rootSpan, Map("db.query" -> "select * from users where id = ?") ) { span => logger.info(span.toMap)("Running db call") *> IO.sleep(100.millis) } def clientCall(rootSpan: Span) = newSpan("remote-call", rootSpan) { child => client.successful( POST( uri"http://localhost:8080/execute", child.toTraceHeaders.toList: _* ) ) } def execute(args: Args, span: Span): IO[Result] = for { _ <- logger.info(span.toMap)(show"Executing request $args") _ <- databaseCall(span) _ <- clientCall(span) _ <- logger.info(span.toMap)(show"Executed request $args") } yield Result(show"${args.message} finished")
  88. What we'll get in UI (applies to all our approaches

    here)
  89. What we'll get in UI (applies to all our approaches

    here)
  90. What we'll get in UI (applies to all our approaches

    here)
  91. Distributed tracing - Kamon, same threadlocal magic as MDC val

    databaseCall = childSpan( "db-call", Map("db.query" -> "select * from users where id = ?") ) { logger.info("Running db call") *> IO.sleep(100.millis) } val clientCall = childSpan("remote-call") { //grab context before, set it again afterwards Tracing[IO].keepSpanAround { client.successful(POST(uri"http://localhost:8080/execute")) } } def execute(args: Args): IO[Result] = for { _ <- logger.info(show"Executing request $args") _ <- databaseCall _ <- clientCall _ <- logger.info(show"Executed request $args") } yield Result(show"${args.message} finished")
  92. Comparison Direct tracing (parameter passing) - Lots of boilerplate -

    Viral (influences APIs on all levels) - Extra work to integrate with MDC - Reporting done manually + Zero magic + No loss of context Kamon tracing (instrumentation) - Lots of magic (AOP, thread locals) - Complex and fragile instrumentation for async IO - Each interface for async needs special instrumentation + No changes in APIs + No boilerplate + Lots of integrations + Handles reporting for you + Context passed to MDC
  93. trait BusinessLogic[F[_]] { def execute(args: Args, span: Span): F[Result] }

    trait AnotherProgram[F[_]] { def summary(result: Result, span: Span): F[Unit] } trait BusinessLogic[F[_]] { def execute(args: Args): F[Result] } trait AnotherProgram[F[_]] { def summary(result: Result): F[Unit] } => Who's the winner? Both solutions have shortcomings. Ideally... No boilerplate No magic No virality Lots of integrations MDC support Automatic reporting
  94. @finalAlg trait BusinessLogic[F[_]] { def execute(args: Args): F[Result] } @finalAlg

    trait AnotherProgram[F[_]] { def summary(result: Result): F[Unit] } def program[F[_]: FlatMap: BusinessLogic: AnotherProgram: Trace](args: Args): F[Unit] = for { result <- Trace[F].span("execute")(BusinessLogic[F].execute(args)) _ <- Trace[F].span("summary")(AnotherProgram[F].summary(result)) } yield () type M[A] = ??? def execute(args: Args): M[A] = program[M](args) What would it look like?
  95. What's the F? def program[F[_]: FlatMap: BusinessLogic: AnotherProgram: Trace](args: Args):

    F[Unit] = for { result <- Trace[F].span("execute")(BusinessLogic[F].execute(args)) _ <- Trace[F].span("summary")(AnotherProgram[F].summary(result)) } yield () def execute(args: Args) = program[?????](args)
  96. final case class MyMagicMonadForContext[F[_], Context, A](impl: Context => F[A]) Let's

    make a new monad
  97. final case class MyMagicMonadForContext[F[_], Context, A](impl: Context => F[A]) {

    def flatMap[B]( f: A => MyMagicMonadForContext[F, Context, B] )( implicit F: Monad[F] ): MyMagicMonadForContext[F, Context, B] = MyMagicMonadForContext { ctx => impl(ctx).flatMap(f(_).impl(ctx)) } def local(f: Context => Context): MyMagicMonadForContext[F, Context, A] = MyMagicMonadForContext(f andThen impl) def scope(ctx: Context): MyMagicMonadForContext[F, Context, A] = local(_ => ctx) } Let's make a new monad
  98. We made a new monad!

  99. We made a new monad!

  100. type AuthenticatedT [F[_], A] = MyMagicMonadForContext[F, User, A] type Transactional

    [F[_], A] = MyMagicMonadForContext[F, Connection, A] type Traced [F[_], A] = MyMagicMonadForContext[F, Span, A] type WithTenant [F[_], A] = MyMagicMonadForContext[F, ClientId, A] We made a new monad!
  101. type AuthenticatedT [F[_], A] = MyMagicMonadForContext[F, User, A] type Transactional

    [F[_], A] = MyMagicMonadForContext[F, Connection, A] type Traced [F[_], A] = MyMagicMonadForContext[F, Span, A] type WithTenant [F[_], A] = MyMagicMonadForContext[F, ClientId, A] type ZIO[-R, +E, +A] = MyMagicMonadForContext[EitherT[IO, E, *], R, A] semantically ;) We made a new monad!
  102. type AuthenticatedT [F[_], A] = MyMagicMonadForContext[F, User, A] type Transactional

    [F[_], A] = MyMagicMonadForContext[F, Connection, A] type Traced [F[_], A] = MyMagicMonadForContext[F, Span, A] type WithTenant [F[_], A] = MyMagicMonadForContext[F, ClientId, A] type ZIO[-R, +E, +A] = MyMagicMonadForContext[EitherT[IO, E, *], R, A] semantically ;) https://youtu.be/7xSfLPD6tiQ?t=707 cats We made a new monad!
  103. Hint: Did we make a new monad?

  104. final case class Kleisli[F[_], -A, B](run: A => F[B]) Hint:

    Did we make a new monad?
  105. final case class Kleisli[F[_], -A, B](run: A => F[B]) Hint:

    type ReaderT[F[_], -A, B] = Kleisli[F, A, B] val ReaderT = Kleisli Did we make a new monad?
  106. Why is ReaderT the perfect fit - automatic passing of

    context - get current context for free with Kleisli.ask: Kleisli[F, A, A] - change context in a region easily with local
  107. However... - we still need integrations (wrapping clients in ones

    that take headers from context in Kleisli) - we need an escape hatch for when we really need MDC - we need a cool library that provides a nice API on top of Kleisli...
  108. However... - we still need integrations (wrapping clients in ones

    that take headers from context in Kleisli) - we need an escape hatch for when we really need MDC - we need a cool library that provides a nice API on top of Kleisli...
  109. Natchez API trait Span[F[_]] { def put(fields: (String, TraceValue)*): F[Unit]

    def kernel: F[Kernel] def span(name: String): Resource[F, Span[F]] } trait Trace[F[_]] { def put(fields: (String, TraceValue)*): F[Unit] def kernel: F[Kernel] def span[A](name: String)(k: F[A]): F[A] } object Trace { implicit def kleisliInstance[F[_]: Bracket[*[_], Throwable]]: Trace[Kleisli[F, Span[F], *]] = ... }
  110. Quick teaser of usage def execute(args: Args): F[Result] = for

    { _ <- Logger[F].info(show"Executing request $args") _ <- Trace[F].span("db")(Timer[F].sleep(100.millis)) _ <- client.successful(POST(uri"http://localhost:8080/execute" _ <- Logger[F].info(show"Executed request $args") } yield Result(show"${args.message} finished")
  111. Quick teaser of usage type Traced[A] = Kleisli[IO, Span[IO], A]

    val businessLogic = BusinessLogic.instance[Traced] def exec(msg: String): IO[Result] = Zipkin .entryPoint[IO]("client", blocker) .use { _.root(msg).use { businessLogic.execute(Args(msg)).run } } def execute(args: Args): F[Result] = for { _ <- Logger[F].info(show"Executing request $args") _ <- Trace[F].span("db")(Timer[F].sleep(100.millis)) _ <- client.successful(POST(uri"http://localhost:8080/execute" _ <- Logger[F].info(show"Executed request $args") } yield Result(show"${args.message} finished")
  112. Quick teaser of usage type Traced[A] = Kleisli[IO, Span[IO], A]

    val businessLogic = BusinessLogic.instance[Traced] def exec(msg: String): IO[Result] = Zipkin .entryPoint[IO]("client", blocker) .use { _.root(msg).use { businessLogic.execute(Args(msg)).run } } def execute(args: Args): F[Result] = for { _ <- Logger[F].info(show"Executing request $args") _ <- Trace[F].span("db")(Timer[F].sleep(100.millis)) _ <- client.successful(POST(uri"http://localhost:8080/execute" _ <- Logger[F].info(show"Executed request $args") } yield Result(show"${args.message} finished") def kleisliTracedClient[F[_]: Bracket[*[_], Throwable]: Defer]( underlying: Client[F] ): Client[Kleisli[F, Span[F], *]] = ...
  113. Quick teaser of usage type Traced[A] = Kleisli[IO, Span[IO], A]

    val businessLogic = BusinessLogic.instance[Traced] def exec(msg: String): IO[Result] = businessLogic.execute(Args(msg)).run def execute(args: Args): F[Result] = for { _ <- Logger[F].info(show"Executing request $args") _ <- Trace[F].span("db")(Timer[F].sleep(100.millis)) _ <- client.successful(POST(uri"http://localhost:8080/execute" _ <- Logger[F].info(show"Executed request $args") } yield Result(show"${args.message} finished") def kleisliTracedClient[F[_]: Bracket[*[_], Throwable]: Defer]( underlying: Client[F] ): Client[Kleisli[F, Span[F], *]] = ... Zipkin .entryPoint[IO]("client", blocker) .use { _.root(msg).use { } } This should eventually be delegated to e.g. http4s server wrapper
  114. Integrations? As of now Soon In the meantime... Using Kamon

    for reporting?
  115. Does it work? Open the examples of this talk and

    see for yourself (including temporary custom zipkin integration)
  116. Thank you blog.kubukoz.com @kubukoz Slides: speakerdeck.com/kubukoz Code: git.io/Jv6mN Find me

    on YouTube! (https://youtube.com/channel/UCBSRCuGz9laxVv0rAnn2O9Q)