Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Discovering Drugs with Kafka Streams

Ben Mabey
October 01, 2019

Discovering Drugs with Kafka Streams

Recursion Pharmaceuticals is turning drug discovery into a data science problem. This entails producing and processing petabytes of microscopy images from carefully designed biological experiments. In early 2017 the data production effort in our laboratory scaled to a point where the existing naive batch processing system was not reliably processing the data. The batch approach was also introducing unwanted lag between experiment image capture time and analysis results since an entire experiment, potentially 8TB+, would not begin processing until all the images were available. This was particularly troublesome for our laboratory as they wanted real time quality control metrics on the images. All of these reasons motivated us to replace the batch processing system with a streaming approach. The original data pipeline was implemented as microservices with no central orchestrator but instead relied on implicit flow between the services. The lack of visibility and robustness made the pipeline difficult and costly to operate. We wanted to address these concerns but also avoid rewriting the existing microservices. By building on top of Kafka Streams we created a flexible, highly available, and robust pipeline which leveraged our existing microservices giving us a clear migration path. This presentation will walk you through our thought process and explain the tradeoffs between using Kafka Streams and Spark for our specific use case. We’ll dive into the details of the workflow system we created on top of Kafka Streams that orchestrates these microservices. We’ve been operating with this system since mid 2017 and the additional scale and robustness has played a key role in enabling Recursion to succeed in its mission of discovering new treatments for various diseases. The messages flowing over our Kafka Streams have already led to clinical trials in humans and will hopefully translate into meaningful impact in patients lives one day.

Ben Mabey

October 01, 2019
Tweet

More Decks by Ben Mabey

Other Decks in Programming

Transcript

  1. Ben Mabey
    VP of Engineering
    @bmabey
    Discovering Drugs
    with Kafka Streams
    Scott Nielsen
    Director of Data Engineering
    K A F K A S U M M I T S F 2 0 1 9

    View Slide

  2. Penn Teller

    View Slide

  3. Penn Teller
    B
    Scott

    View Slide

  4. Decoding Biology
    to Radically Improve Lives

    View Slide

  5. © 2017 Recursion Pharmaceuticals
    1000s of untreated
    genetic diseases
    Photo of our wall?

    View Slide

  6. 0.00001
    0.0001
    0.001
    0.01
    0.1
    1
    10
    100
    1000
    1971
    1972
    1973
    1974
    1975
    1976
    1977
    1978
    1979
    1980
    1981
    1982
    1983
    1984
    1985
    1986
    1987
    1988
    1989
    1990
    1991
    1992
    1993
    1994
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013
    2014
    2015
    Transistor Area (% of 1970 values)
    Moore’s Law

    View Slide

  7. 0.00001
    0.0001
    0.001
    0.01
    0.1
    1
    10
    100
    1000
    1971
    1972
    1973
    1974
    1975
    1976
    1977
    1978
    1979
    1980
    1981
    1982
    1983
    1984
    1985
    1986
    1987
    1988
    1989
    1990
    1991
    1992
    1993
    1994
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013
    2014
    2015
    Transistor Area (% of 1970 values)
    1
    10
    100
    1971
    1972
    1973
    1974
    1975
    1976
    1977
    1978
    1979
    1980
    1981
    1982
    1983
    1984
    1985
    1986
    1987
    1988
    1989
    1990
    1991
    1992
    1993
    1994
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    R&D Spend / Drug (% of 2007 values)
    Moore’s Law

    View Slide

  8. 0.00001
    0.0001
    0.001
    0.01
    0.1
    1
    10
    100
    1000
    1971
    1972
    1973
    1974
    1975
    1976
    1977
    1978
    1979
    1980
    1981
    1982
    1983
    1984
    1985
    1986
    1987
    1988
    1989
    1990
    1991
    1992
    1993
    1994
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013
    2014
    2015
    Transistor Area (% of 1970 values)
    1
    10
    100
    1971
    1972
    1973
    1974
    1975
    1976
    1977
    1978
    1979
    1980
    1981
    1982
    1983
    1984
    1985
    1986
    1987
    1988
    1989
    1990
    1991
    1992
    1993
    1994
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    R&D Spend / Drug (% of 2007 values)
    Moore’s Law
    Eroom’s Law

    View Slide

  9. 0
    10
    20
    30
    40
    50
    60
    1993
    1994
    1995
    1996
    1997
    1998
    1999
    2000
    2001
    2002
    2003
    2004
    2005
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013
    2014
    2015
    2016
    Number of Drugs Approved in US (1993-2016)

    View Slide

  10. How can we fix this?

    View Slide

  11. RecursionPharma.com

    View Slide

  12. RecursionPharma.com
    Over 7 million per week

    View Slide

  13. RecursionPharma.com
    hoechst (DNA)

    View Slide

  14. RecursionPharma.com
    concanavalin A (ER)

    View Slide

  15. RecursionPharma.com
    mitotracker (mitochondria)

    View Slide

  16. RecursionPharma.com
    WGA (golgi apparatus, cell membrane)

    View Slide

  17. RecursionPharma.com
    SYTO 14 (RNA, nucleoli)

    View Slide

  18. RecursionPharma.com
    phalloidin (actin fibers)

    View Slide

  19. RecursionPharma.com
    combined

    View Slide

  20. How do these pretty
    pictures help?

    View Slide

  21. Healthy child
    Child with rare
    genetic disease
    (Cornelia de Lange
    Syndrome)

    View Slide

  22. Healthy child Healthy cells
    Child with rare
    genetic disease
    (Cornelia de Lange
    Syndrome)
    Genetic disease
    model cells
    (Cornelia de Lange
    Syndrome)

    View Slide

  23. Healthy Disease

    View Slide

  24. Healthy Disease Disease + Drug?

    View Slide

  25. Public Dataset: http://rxrx.ai
    Nature Article
    Machine learning brings cell imaging promises into focus
    https://tinyurl.com/ml-cells

    Learn more…

    View Slide

  26. How is this data
    produced?

    View Slide

  27. 308 wells/plate

    View Slide

  28. 4 sites/well
    308 wells/plate

    View Slide

  29. 6 channels (images)/site
    7,392 images per plate
    4 sites/well
    308 wells/plate

    View Slide

  30. 6 channels (images)/site
    7,392 images per plate
    4 sites/well
    308 wells/plate
    ~69GB per plate

    View Slide

  31. Experiment A Experiment B Experiment C
    Experiment D

    View Slide

  32. Our “Series A” System

    View Slide

  33. On-Premise

    View Slide

  34. On-Premise

    View Slide

  35. Stream images to S3
    On-Premise

    View Slide

  36. Generate thumbnails
    Image metrics
    Stream images to S3
    On-Premise

    View Slide

  37. Generate thumbnails
    Image metrics
    Stream images to S3
    On-Premise

    View Slide

  38. Generate thumbnails
    Image metrics
    Fire and forget
    Stream images to S3
    On-Premise

    View Slide

  39. Generate thumbnails
    Image metrics
    Fire and forget
    Experiment A
    Stream images to S3
    On-Premise

    View Slide

  40. Generate thumbnails
    Image metrics
    Fire and forget
    Experiment A
    Stream images to S3
    Extract Features
    On-Premise
    Process experiments
    in batch

    View Slide

  41. Generate thumbnails
    Image metrics
    Fire and forget
    Stream images to S3
    Extract Features
    On-Premise
    Process experiments
    in batch

    View Slide

  42. Generate thumbnails
    Image metrics
    Fire and forget
    Stream images to S3
    Extract Features
    metrics, models,
    reports, etc
    On-Premise
    Process experiments
    in batch

    View Slide

  43. Generate thumbnails
    Image metrics
    Fire and forget
    Stream images to S3
    Extract Features
    metrics, models,
    reports, etc
    On-Premise
    Process experiments
    in batch

    View Slide

  44. Generate thumbnails
    Image metrics
    Fire and forget
    Stream images to S3
    Extract Features
    metrics, models,
    reports, etc
    On-Premise
    Process experiments
    in batch

    View Slide

  45. Traditional, low throughput, biology

    View Slide

  46. Traditional, low throughput, biology
    ~6-12 plates per week, ~400-800GB

    View Slide

  47. © 2017 Recursion Pharmaceuticals
    High-throughput
    experiments
    Robots
    photo

    View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. 100
    6.9TB

    View Slide

  52. 100
    6.9TB
    300
    20TB

    View Slide

  53. 100
    6.9TB
    300
    20TB
    Kafka Streams solution
    was launched

    View Slide

  54. 100
    6.9TB
    300
    20TB
    700
    48TB
    1,300
    90TB
    1,700
    118TB
    1,900
    132 TB
    Kafka Streams solution
    was launched

    View Slide

  55. View Slide

  56. View Slide

  57. 100
    6.9TB
    300
    20TB
    700
    48TB
    1,300
    90TB
    1,700
    118TB
    1,900
    132 TB

    View Slide

  58. 100
    6.9TB
    300
    20TB
    700
    48TB
    1,300
    90TB
    1,700
    118TB
    1,900
    132 TB
    280 TB
    Today

    View Slide

  59. So what was wrong with the
    original system?

    View Slide

  60. Generate thumbnails
    Image metrics
    Extract Features
    metrics, models,
    reports, etc
    On-Premise
    Process experiments
    in batch

    View Slide

  61. Experiment A Experiment B Experiment C
    Experiment D
    Plates are not imaged in order

    View Slide

  62. View Slide

  63. Migration Goals

    View Slide

  64. Migration Goals
    Move orchestration and processing to cloud.

    View Slide

  65. Migration Goals
    Move orchestration and processing to cloud.

    View Slide

  66. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.

    View Slide

  67. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.

    View Slide

  68. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.
    Preserve existing micro-services logic.

    View Slide

  69. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.
    Preserve existing micro-services logic.
    Make cheaper.

    View Slide

  70. Let’s take a look at the
    logical pipeline that we
    needed to implement…

    View Slide

  71. Images / channel level

    View Slide

  72. Images / channel level
    image level metrics

    View Slide

  73. Images / channel level
    site (all channels/images)
    thumbnails
    image level metrics

    View Slide

  74. Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    image level metrics

    View Slide

  75. Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    image level metrics

    View Slide

  76. Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    image level metrics
    site metrics

    View Slide

  77. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    image level metrics
    site metrics

    View Slide

  78. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    image level metrics
    site metrics
    metrics

    View Slide

  79. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    image level metrics
    site metrics
    metrics
    plate level features metrics

    View Slide

  80. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    experiment features
    image level metrics
    site metrics
    metrics
    plate level features metrics
    Experiment A

    View Slide

  81. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    experiment features
    image level metrics
    site metrics
    metrics
    plate level features metrics
    metrics, models,
    reports, etc
    Experiment A

    View Slide

  82. View Slide

  83. View Slide

  84. View Slide

  85. Kafka Streams
    was just released…

    View Slide

  86. Kafka Streams
    was just released…

    View Slide

  87. dagger
    workflow library
    written on top of Kafka Streams
    that orchestrates microservices

    View Slide

  88. dagger
    workflow library
    written on top of Kafka Streams
    that orchestrates microservices
    Dagger, ya know, because
    it is all about the workflows
    represented as directed
    acyclic graphs, i.e. DAGs.

    View Slide

  89. dagger
    workflow library
    written on top of Kafka Streams
    that orchestrates microservices

    View Slide

  90. New workflow system in 2017?

    View Slide

  91. New workflow system in 2017?
    Not Invented Here syndrome?

    View Slide

  92. Core logic in library
    is ~2800 LOC
    New workflow system in 2017?
    Not Invented Here syndrome?

    View Slide

  93. Core logic in library
    is ~2800 LOC
    All of our our DAGs,
    including schema, task,
    and workflow definition
    ~1700 LOC
    New workflow system in 2017?
    Not Invented Here syndrome?

    View Slide

  94. Core logic in library
    is ~2800 LOC
    All of our our DAGs,
    including schema, task,
    and workflow definition
    ~1700 LOC
    New workflow system in 2017?
    Not Invented Here syndrome?

    View Slide

  95. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    experiment features
    image level metrics
    site metrics
    metrics
    plate level features metrics
    metrics, models,
    reports, etc

    View Slide

  96. Let’s look at a small workflow
    using Kafka Streams initially…

    View Slide

  97. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream

    View Slide

  98. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    final KTable experimentMetadata = builder.table(
    EXPERIMENT_METADATA_TOPIC);
    final KStream images = builder.stream(
    CHANNEL_IMAGES_TOPIC);
    final KStream sites = images
    .groupBy((exp, channel) -> channel.site())
    .windowedBy(SessionWindows.with(Duration.ofHours(SESSION_WINDOW_HOURS)))
    .aggregate(
    () -> new AggState(),
    (site, channel, agg) -> agg.observe(channel.site(), channel.channel),
    (site, agg_a, agg_b) -> agg_a.merge(agg_b))
    .join(experimentMetadata,
    (agg, expMeta) -> agg.markCompleted(expMeta.numChannels))
    .filterValues(agg -> agg.isComplete())
    .mapValues(agg -> agg.site());
    sites.to(SITE_IMAGES_TOPIC);

    View Slide

  99. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    final KTable experimentMetadata = builder.table(
    EXPERIMENT_METADATA_TOPIC);
    final KStream images = builder.stream(
    CHANNEL_IMAGES_TOPIC);
    final KStream sites = images
    .groupBy((exp, channel) -> channel.site())
    .windowedBy(SessionWindows.with(Duration.ofHours(SESSION_WINDOW_HOURS)))
    .aggregate(
    () -> new AggState(),
    (site, channel, agg) -> agg.observe(channel.site(), channel.channel),
    (site, agg_a, agg_b) -> agg_a.merge(agg_b))
    .join(experimentMetadata,
    (agg, expMeta) -> agg.markCompleted(expMeta.numChannels))
    .filterValues(agg -> agg.isComplete())
    .mapValues(agg -> agg.site());
    sites.to(SITE_IMAGES_TOPIC);

    View Slide

  100. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    final KTable experimentMetadata = builder.table(
    EXPERIMENT_METADATA_TOPIC);
    final KStream images = builder.stream(
    CHANNEL_IMAGES_TOPIC);
    final KStream sites = images
    .groupBy((exp, channel) -> channel.site())
    .windowedBy(SessionWindows.with(Duration.ofHours(SESSION_WINDOW_HOURS)))
    .aggregate(
    () -> new AggState(),
    (site, channel, agg) -> agg.observe(channel.site(), channel.channel),
    (site, agg_a, agg_b) -> agg_a.merge(agg_b))
    .join(experimentMetadata,
    (agg, expMeta) -> agg.markCompleted(expMeta.numChannels))
    .filterValues(agg -> agg.isComplete())
    .mapValues(agg -> agg.site());
    sites.to(SITE_IMAGES_TOPIC);

    View Slide

  101. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    final KTable experimentMetadata = builder.table(
    EXPERIMENT_METADATA_TOPIC);
    final KStream images = builder.stream(
    CHANNEL_IMAGES_TOPIC);
    final KStream sites = images
    .groupBy((exp, channel) -> channel.site())
    .windowedBy(SessionWindows.with(Duration.ofHours(SESSION_WINDOW_HOURS)))
    .aggregate(
    () -> new AggState(),
    (site, channel, agg) -> agg.observe(channel.site(), channel.channel),
    (site, agg_a, agg_b) -> agg_a.merge(agg_b))
    .join(experimentMetadata,
    (agg, expMeta) -> agg.markCompleted(expMeta.numChannels))
    .filterValues(agg -> agg.isComplete())
    .mapValues(agg -> agg.site());
    sites.to(SITE_IMAGES_TOPIC);

    View Slide

  102. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    final KTable experimentMetadata = builder.table(
    EXPERIMENT_METADATA_TOPIC);
    final KStream images = builder.stream(
    CHANNEL_IMAGES_TOPIC);
    final KStream sites = images
    .groupBy((exp, channel) -> channel.site())
    .windowedBy(SessionWindows.with(Duration.ofHours(SESSION_WINDOW_HOURS)))
    .aggregate(
    () -> new AggState(),
    (site, channel, agg) -> agg.observe(channel.site(), channel.channel),
    (site, agg_a, agg_b) -> agg_a.merge(agg_b))
    .join(experimentMetadata,
    (agg, expMeta) -> agg.markCompleted(expMeta.numChannels))
    .filterValues(agg -> agg.isComplete())
    .mapValues(agg -> agg.site());
    sites.to(SITE_IMAGES_TOPIC);

    View Slide

  103. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    final KTable experimentMetadata = builder.table(
    EXPERIMENT_METADATA_TOPIC);
    final KStream images = builder.stream(
    CHANNEL_IMAGES_TOPIC);
    final KStream sites = images
    .groupBy((exp, channel) -> channel.site())
    .windowedBy(SessionWindows.with(Duration.ofHours(SESSION_WINDOW_HOURS)))
    .aggregate(
    () -> new AggState(),
    (site, channel, agg) -> agg.observe(channel.site(), channel.channel),
    (site, agg_a, agg_b) -> agg_a.merge(agg_b))
    .join(experimentMetadata,
    (agg, expMeta) -> agg.markCompleted(expMeta.numChannels))
    .filterValues(agg -> agg.isComplete())
    .mapValues(agg -> agg.site());
    sites.to(SITE_IMAGES_TOPIC);

    View Slide

  104. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Kafka Streams App External Service
    task input topic

    View Slide

  105. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Kafka Streams App External Service
    task input topic

    View Slide

  106. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Kafka Streams App External Service
    task input topic
    task output topic

    View Slide

  107. How would you do the same
    workflow in dagger?

    View Slide

  108. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream

    View Slide

  109. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Input topics & tables

    View Slide

  110. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Input topics & tables
    Stream operations

    View Slide

  111. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Input topics & tables
    Stream operations
    Tasks

    View Slide

  112. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    Input topics & tables
    Stream operations
    Tasks
    Output topics

    View Slide

  113. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  114. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  115. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  116. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  117. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  118. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  119. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}
    Specify function to be used

    View Slide

  120. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  121. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  122. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  123. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  124. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream
    {"name": "extract-site-level-features",
    "graph":
    {"images-channel":
    {"type": "topic-stream", "topic-name": "images_channels"}
    "experiment-metadata":
    {"type": "topic-table", "topic-name": "experiment_metadata"},
    "images-site":
    {"type": "stream-operation",
    "key-schema": "long", "value-schema": "job_site_level",
    "inputs": ["images-channel", "experiment-metadata"],
    "function": "aggregations/images-site-grouping"},
    "features-site":
    {"type": "external-task",
    "stream": "images-site",
    "task-name": "extract-features"},
    "features-output":
    {"type": "publish",
    "topic-name": "extracted_features",
    "stream": "features-site"}}}

    View Slide

  125. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    {:name "extract-site-level-features",
    :graph
    {:images-channel
    {:type :topic-stream, :topic-name "images_channels"},
    :experiment-metadata
    {:type :topic-table, :topic-name "experiment_metadata"},
    :images-site
    {:type :stream-operation,
    :key-schema :long, :value-schema "job_site_level",
    :inputs [:images-channel, :experiment-metadata],
    :function (fn [images-channel experiment-metadata] …),
    :features-site
    {:type :external-task,
    :task-name "extract-features",
    :stream :images-site},
    :features-output
    {:type :publish,
    :stream :features-site,
    :topic-name "extracted_features"}}}
    images_site stream

    View Slide

  126. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    {:name "extract-site-level-features",
    :graph
    {:images-channel
    {:type :topic-stream, :topic-name "images_channels"},
    :experiment-metadata
    {:type :topic-table, :topic-name "experiment_metadata"},
    :images-site
    {:type :stream-operation,
    :key-schema :long, :value-schema "job_site_level",
    :inputs [:images-channel, :experiment-metadata],
    :function (fn [images-channel experiment-metadata] …),
    :features-site
    {:type :external-task,
    :task-name "extract-features",
    :stream :images-site},
    :features-output
    {:type :publish,
    :stream :features-site,
    :topic-name "extracted_features"}}}
    images_site stream

    View Slide

  127. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    {:name "extract-site-level-features",
    :graph
    {:images-channel
    {:type :topic-stream, :topic-name "images_channels"},
    :experiment-metadata
    {:type :topic-table, :topic-name "experiment_metadata"},
    :images-site
    {:type :stream-operation,
    :key-schema :long, :value-schema "job_site_level",
    :inputs [:images-channel, :experiment-metadata],
    :function (fn [images-channel experiment-metadata] …),
    :features-site
    {:type :external-task,
    :task-name "extract-features",
    :stream :images-site},
    :features-output
    {:type :publish,
    :stream :features-site,
    :topic-name "extracted_features"}}}
    Inline function directly
    images_site stream

    View Slide

  128. extract site features
    images_channel topic
    experiment_metadata topic table
    extracted_features topic
    images_site stream

    View Slide

  129. ( )
    Dagger is a compiler

    View Slide

  130. ( ) Kafka Streams
    Topology
    Dagger is a compiler

    View Slide


  131. ( ) Kafka Streams
    Topology
    Dagger is a compiler

    View Slide

  132. What would the entire pipeline
    look like in dagger?

    View Slide

  133. well level features
    Images / channel level
    site (all channels/images)
    thumbnails
    site level features
    experiment features
    image level metrics
    site metrics
    metrics
    plate level features metrics
    metrics, models,
    reports, etc

    View Slide

  134. View Slide


  135. Our pipeline application that uses Dagger

    View Slide

  136. How does the whole system
    look like now?

    View Slide

  137. Generate thumbnails
    Image metrics
    Extract Features
    metrics, models,
    reports, etc
    On-Premise
    Process experiments
    in batch

    View Slide

  138. On-Premise

    View Slide

  139. On-Premise
    Publish Image Events

    View Slide

  140. On-Premise
    Publish Image Events

    View Slide

  141. On-Premise


    Publish Image Events
    Uploader

    View Slide

  142. On-Premise


    Publish Image Events
    Uploader
    dagger is used here too!

    View Slide

  143. On-Premise


    Publish Image Events
    Uploader

    View Slide

  144. On-Premise


    Publish Image Events
    Uploader

    View Slide

  145. On-Premise



    Publish Image Events
    Uploader

    View Slide

  146. On-Premise



    Autoscaled Workers
    Publish Image Events
    Uploader

    View Slide

  147. On-Premise



    Microservices
    Publishers & Consumers
    Autoscaled Workers
    Publish Image Events
    Uploader

    View Slide

  148. On-Premise



    BigQuery
    SQL
    Transform & Load

    Microservices
    Publishers & Consumers
    Autoscaled Workers
    Publish Image Events
    Uploader

    View Slide

  149. Migration Goals

    View Slide

  150. Migration Goals
    Move orchestration and processing to cloud.

    View Slide

  151. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.


    View Slide

  152. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.
    Preserve existing micro-services logic.



    View Slide

  153. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.
    Preserve existing micro-services logic.
    Make cheaper.




    View Slide

  154. Migration Goals
    Move orchestration and processing to cloud.
    Faster feedback and less bursty workloads.
    Preserve existing micro-services logic.
    Make cheaper.




    EC2 and Lambda -> Google Clould preemptibles.

    View Slide

  155. Big data, small metadata…

    View Slide

  156. Big data, small metadata…

    View Slide

  157. Lessons learned…

    View Slide

  158. Early Adopter Tax

    View Slide

  159. Missed out on mature
    workflow monitoring

    View Slide

  160. View Slide

  161. View Slide

  162. On-Premise



    Transform & Load

    Uploader
    Easy deployment!

    View Slide

  163. Kafka Streams App External Service
    task input topic
    task output topic
    Durable Log FTW

    View Slide

  164. Thank you!

    View Slide

  165. View Slide

  166. Come help us decode biology!
    @RecursionPharma @bmabey

    View Slide