Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Morpheus

 Morpheus

Morpheus is a Lumiata-built, internal streaming and batch data processing framework, which streamlines complex data processing across the organization. This talk describes how morpheus is built and what goes under the hood

Sushant Hiray

January 04, 2016
Tweet

More Decks by Sushant Hiray

Other Decks in Technology

Transcript

  1. Morpheus Table of Contents 1 Data Flow 2 Definition 3

    Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 3 / 28
  2. Morpheus | Data Flow Table of Contents 1 Data Flow

    2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 4 / 28
  3. Morpheus | Data Flow Data Flow : Raw Data to

    Products Goal is to type, structure and combine data for efficient distributed computation across large data sets to enable Lumiata’s Products Raw Data Typed Data datastore Function Actions Products CCLF CSV FHIR JSON Source Source Input Output Sink Lumiata 5 / 28
  4. Morpheus | Definition Table of Contents 1 Data Flow 2

    Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 6 / 28
  5. Morpheus | Definition Morpheus : Definition It sources your data

    from HDFS, Kafka, Local FileSystem and HTTP. It organizes your data by adding a strong type system to your data. The bottom of the hierarchical type system is a flat key-value map, which develops into a List of Maps, followed by Patient Graph. It stores your data with version control, auditing and logging. It has built-in and offers extensible Reader functions. These functions understand and organize data from different user specified input data types. Examples: CSV, HL7, CCLF, JSON. It applies functions to data in batch and streaming. Any user-specified function which uses the type system can be integrated. It can Sink data to Kafka, Cassandra and Elasticsearch. Lumiata 7 / 28
  6. Morpheus | Type System Table of Contents 1 Data Flow

    2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 8 / 28
  7. Morpheus | Type System Type System Type Description Example Key-Value

    Pairs A one to one map { ”a”:”b”, ”c”:”e”} List of KV Pairs A list of one to one maps [{”a”: ”b”, ”c”: ”e”}, {”h”: ”i”, ”j”: ”k”}] Patient Graph A property graph with edges shown later Lumiata 9 / 28
  8. Morpheus | Patient Graph Table of Contents 1 Data Flow

    2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 10 / 28
  9. Morpheus | Patient Graph Patient Graph Definition A graph-like data

    structure to store data of a given patient. Nodes contain clinical or context information. Edges are made using context of node pairs as input to reasoning provided by the medical graph or raw data mining. This design is analogous to a data structure which would contain event + context (node) information FHIR plus reasoning (edges) from the medical graph. Lumiata 11 / 28
  10. Morpheus | Raw data to Patient Graph Table of Contents

    1 Data Flow 2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 12 / 28
  11. Morpheus | Raw data to Patient Graph Raw data to

    Patient Graph Data is received by our secured sftp service. It is copied in realtime to our distributed filesystem by our Slurper. From here the journey of the data begins. Before any functions or actionable insights are calculated using it, all data is converted to Patient Graph Lumiata 13 / 28
  12. Morpheus | Datastore Table of Contents 1 Data Flow 2

    Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 14 / 28
  13. Morpheus | Datastore Datastore : Versioning, Availability of all Types

    All data is stored, nothing is ever lost. You can always roll back to a previous version of data There is constant auditing of data which is present through ELK Stack There is an Alerting System to announce errors via slack or email. Lumiata 15 / 28
  14. Morpheus | Function Library Table of Contents 1 Data Flow

    2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 16 / 28
  15. Morpheus | Function Library Function Libraries Type Function Library Key-Value

    Pairs Select, Rename and Transform List of KV Pairs Sort, Joins and Filters Patient Graph Aggregator, Calculators, Chronic Analysis Lumiata 17 / 28
  16. Morpheus | Function Library Function Processing Modes Each function from

    a library can be run in two modes, batch or streaming A function can process typed data in batch which is referred as Job streaming which is referred as Trouper api which is referred as Actor Lumiata 18 / 28
  17. Morpheus | Function Library | Batch Job : Batch Processing

    A Job is a distributed execution across machines and cpu-cores (It uses mesos/spark cluster for compute). It applies the function to the input data type and writes the output to either our datastore (cassandra) and other databases like ElasticSearch. It is used for functions on patient graphs whose output we need in snapshots. Lumiata 19 / 28
  18. Morpheus | Function Library | Streaming Trouper : Stream Processing

    A Trouper is a distributed execution across machines and cpu-cores in realtime and always alive-fashion (It uses spark-streaming for compute & kafka for message passing). It is used for functions on patient graphs whose output we keep updated in real-time. Lumiata 20 / 28
  19. Morpheus | Function Library | API Actor : API An

    actor is a light weight process built on top of AKKA actor system. It is an alternative entry point to trigger the jobs in Batch mode or Streaming mode. Essentially the actor system helps to schedule multiple jobs over the cluster. Lumiata 21 / 28
  20. Morpheus | Control, Logging, Alerting and Monitoring Table of Contents

    1 Data Flow 2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 22 / 28
  21. Morpheus | Control, Logging, Alerting and Monitoring Control, Logging, Alerting

    and Monitoring All data is stored, nothing is ever lost. You can always roll back to a previous version of data There is constant auditing of data which is present through ELK Stack There is an Alerting System to announce errors via slack or email. Lumiata 23 / 28
  22. Morpheus | Performance Table of Contents 1 Data Flow 2

    Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 24 / 28