Morpheus is a Lumiata-built, internal streaming and batch data processing framework, which streamlines complex data processing across the organization. This talk describes how morpheus is built and what goes under the hood
Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 3 / 28
2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 4 / 28
Products Goal is to type, structure and combine data for efficient distributed computation across large data sets to enable Lumiata’s Products Raw Data Typed Data datastore Function Actions Products CCLF CSV FHIR JSON Source Source Input Output Sink Lumiata 5 / 28
Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 6 / 28
from HDFS, Kafka, Local FileSystem and HTTP. It organizes your data by adding a strong type system to your data. The bottom of the hierarchical type system is a flat key-value map, which develops into a List of Maps, followed by Patient Graph. It stores your data with version control, auditing and logging. It has built-in and offers extensible Reader functions. These functions understand and organize data from different user specified input data types. Examples: CSV, HL7, CCLF, JSON. It applies functions to data in batch and streaming. Any user-specified function which uses the type system can be integrated. It can Sink data to Kafka, Cassandra and Elasticsearch. Lumiata 7 / 28
2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 8 / 28
Pairs A one to one map { ”a”:”b”, ”c”:”e”} List of KV Pairs A list of one to one maps [{”a”: ”b”, ”c”: ”e”}, {”h”: ”i”, ”j”: ”k”}] Patient Graph A property graph with edges shown later Lumiata 9 / 28
2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 10 / 28
structure to store data of a given patient. Nodes contain clinical or context information. Edges are made using context of node pairs as input to reasoning provided by the medical graph or raw data mining. This design is analogous to a data structure which would contain event + context (node) information FHIR plus reasoning (edges) from the medical graph. Lumiata 11 / 28
1 Data Flow 2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 12 / 28
Patient Graph Data is received by our secured sftp service. It is copied in realtime to our distributed filesystem by our Slurper. From here the journey of the data begins. Before any functions or actionable insights are calculated using it, all data is converted to Patient Graph Lumiata 13 / 28
Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 14 / 28
All data is stored, nothing is ever lost. You can always roll back to a previous version of data There is constant auditing of data which is present through ELK Stack There is an Alerting System to announce errors via slack or email. Lumiata 15 / 28
2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 16 / 28
a library can be run in two modes, batch or streaming A function can process typed data in batch which is referred as Job streaming which is referred as Trouper api which is referred as Actor Lumiata 18 / 28
A Job is a distributed execution across machines and cpu-cores (It uses mesos/spark cluster for compute). It applies the function to the input data type and writes the output to either our datastore (cassandra) and other databases like ElasticSearch. It is used for functions on patient graphs whose output we need in snapshots. Lumiata 19 / 28
A Trouper is a distributed execution across machines and cpu-cores in realtime and always alive-fashion (It uses spark-streaming for compute & kafka for message passing). It is used for functions on patient graphs whose output we keep updated in real-time. Lumiata 20 / 28
actor is a light weight process built on top of AKKA actor system. It is an alternative entry point to trigger the jobs in Batch mode or Streaming mode. Essentially the actor system helps to schedule multiple jobs over the cluster. Lumiata 21 / 28
1 Data Flow 2 Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 22 / 28
and Monitoring All data is stored, nothing is ever lost. You can always roll back to a previous version of data There is constant auditing of data which is present through ELK Stack There is an Alerting System to announce errors via slack or email. Lumiata 23 / 28
Definition 3 Type System 4 Patient Graph 5 Raw data to Patient Graph 6 Datastore 7 Function Library Batch Streaming API 8 Control, Logging, Alerting and Monitoring 9 Performance Lumiata 24 / 28