Slide 1

Slide 1 text

SOA & Big data   Arnon  Rotem-­‐Gal-­‐Oz  

Slide 2

Slide 2 text

Sept  2012  –  iOS6  launched  with  new  maps  applica>on  

Slide 3

Slide 3 text

But  something  went  terribly  wrong….   hEp://theamazingios6maps.tumblr.com/  

Slide 4

Slide 4 text

•  It  isn’t  just  about   geKng  all  the  data   there   •  Algorithms  are  cool   but  we  need  humans   in  the  loop   •  Hire  the  right  people   •  Test  !  Test  !  Test!     hEp://theamazingios6maps.tumblr.com/  

Slide 5

Slide 5 text

hEp://theamazingios6maps.tumblr.com/   It  isn’t  just  one  pile  of  data  

Slide 6

Slide 6 text

Integra>ng  Big  data  &  SOA     Yoel  Ben  Avraham  -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Data     Refinery     Ofer  Berger     hEp://www.haifacity.com/allsites/allpic/a/A1738/A1738Pic3326.jpg  

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

ETL  integra>on   DB  integra>on   File-­‐based  integra>on   Online  integra>on   Department   Server   DB  

Slide 11

Slide 11 text

ASB BLT HDL AFT TGI FRY DRW SWG QYD DLY BST WIU ASB ZIS XOI CUI RMO DLY XPS KYF KFC WHR JIA GEX FQA VUH HCO WKD ECP SKD MFP WCP DKE AJT   Object  soup  

Slide 12

Slide 12 text

ASB BLT HDL AFT TGI FRY DRW SWG QYD DLY BST WIU ASB ZIS XOI CUI RMO DLY XPS KYF KFC WHR JIA GEX FQA VUH HCO WKD ECP SKD MFP WCP DKE AJT   Services   Invoices Customer Promotions Orders

Slide 13

Slide 13 text

Service   Describes   Endpoint   Exposes   Messages   Sends/receives   Contracts   Binds  to   Service     consumer   Implements   Policy   Governed  by   Sends/receives   Adheres  to   Component   Rela>on   Key   Understands   Serves  

Slide 14

Slide 14 text

Interac>ons   Customer   Categories   Agents  

Slide 15

Slide 15 text

Integra>ng  Big  data  &  SOA     Yoel  Ben  Avraham  -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  

Slide 16

Slide 16 text

                                                                Coordinator*   Prepare/commit/undo   Service  consumer   Protocol   Rela>on   Key   SOA  component   PaEern  component   Concern/aEribute     RegistraDon   Perform     acDvity   Compensate   Create   context   Ini>ator   Service     Par>cipator   Perform     acDvity   Compensate   Prepare  /   commit  /   undo   Register   AcDviDes  and  replies   AcDviDes  and  replies   Saga  

Slide 17

Slide 17 text

Hadoop Cluster NIM Interaction Recordings ETL Customer HBase Raw (HDFS) Interactions HBase Data Management HBase HCatalog Resolved Interactions(H DFS) Categories HBase HBase Resolved Interactions(H DFS)

Slide 18

Slide 18 text

So,   what’s  the     problem  ?  

Slide 19

Slide 19 text

 &  Big  data     can’t  move  

Slide 20

Slide 20 text

Performance  of  joins  in  distributed   system  sucks!   Node 1 customers A-H Interactions 0-99 Node 2 customers I-M Interactions 100-199 Node 3 customers N-Z Interactions 200-299 {”Interac>on":  {      "id":  ”5",        ”par>cipants":  {          ”customer":  [              {”surname":  ”McDonalds",  ”name":  ”Old"},]      }   }}  

Slide 21

Slide 21 text

Cookie  cuEer   scalability    

Slide 22

Slide 22 text

Cell  architecture   Node   2   Node   3   Node   1   Node   N  

Slide 23

Slide 23 text

Cell  Architecture   BUS Categories Customers Interactions Reference Data ORCA … HBase HDFS HBase HBase HBase HBase

Slide 24

Slide 24 text

                  Initiate business process Workflow engine Endpoint   Workflow instance Invoke services Manage   process   Route request Host   workflows   Schedule   Service                   Endpoint   Service Manage   workflows   Monitor   workflows   Orchestra>on  

Slide 25

Slide 25 text

Map  Reduce  processing  pipeline   Resolve Customer IDs (Custoemr) Categorize Segment (Categorization) Update Segment document (Interaction) Map pipeline Segment Row Retrive segment data - create segment document (Interaction) Write Categories Results (Categorization) Write Interaction (interaction) Customers Local cache InteractionID, Segment Row Map Prepare data mart Export (Datamart) Update Interaction document (Interaction) Reduce pipeline Interaction & Segments Categorize Interaction (Categorization) Write Categories Results (Categorization) Write Interaction (interaction) Reduce Write Interaction (interaction) Hadoop Map/Reduce

Slide 26

Slide 26 text

Map  Reduce  processing  pipeline   Resolve Customer IDs (Custoemr) Categorize Segment (Categorization) Update Segment document (Interaction) Map pipeline Segment Row Retrive segment data - create segment document (Interaction) Write Categories Results (Categorization) Write Interaction (interaction) Customers Local cache InteractionID, Segment Row Map

Slide 27

Slide 27 text

Data  Facets  

Slide 28

Slide 28 text

In-memory Data grid Columnar Graph Indexing NewSQL Columnar Caching HBase   Hypertable   Neo4j   Apache  Solr   AKvio   IndexTank   RavenDB   Cassandra   MongoDB   CouchDB   ScaleBase   VoltDB   Amazon  RDS   HP  Ver>ca   EMC  Greenplum   IBM  Netezza   Microsoo  PDW   Aster  Data   ParAccel   Memcached   GigaSpaces   Redis   GridGain   Oracle  Coherence   WebSphere  eXtreme  Scale   Pregel   Hama   SAP  HANA   Oracle  Exadata   Accumulo   Document Relational Analytics/MPP Key-value store Distributed file systems Hadoop   GlusterFS  

Slide 29

Slide 29 text

Datawarehouse   (Hadoop/Hbase)         20  years     detailed   aggregated       Datamart(s)   (RDBMS)       6-­‐12  months   Detailed     1-­‐3  years  aggregated   Cube   (MOLAP)       6-­‐?  Months   aggregated   Real-­‐>me   (in  memory)   1-­‐7  days   detailed   Data  is  mul>-­‐>ered  

Slide 30

Slide 30 text

Data  warehouse   (Hadoop/Hbase)         20  years     detailed   aggregated       Real-­‐>me     1-­‐7  days   detailed   Datamart(s)   (Columnar)     6-­‐12  months   Detailed     Data  is  mul>-­‐>ered  

Slide 31

Slide 31 text

SOA  leaves  us  with  a  lot   of  isolated  data  

Slide 32

Slide 32 text

                                            Subscribed/ pulled data                                           Pull data Data backend Endpoint Out Load Report Ingest   Clean   Join   Transform Transpose   Produce     reports   Report Endpoint Request Raw  data   ODS/DM                           SQL endpoint                           SQL endpoint Landing  area   Service Aggregated  Repor>ng  

Slide 33

Slide 33 text

 Landing   Raw  data           DW/ODS   Views   Transforma>on   service   1 1 2 3 4 5 Load   service   2 Report   service  

Slide 34

Slide 34 text

Report tool Data mart 4 3 Raw data (HDFS) Aggregation map/reduce HBase ETL (map/reduce +ETL) Drill through REST API Details Aggregates 1 2 2 5 6 7 8 9 10

Slide 35

Slide 35 text

Take  aways   SOA  &  Big  data  are  beEer  together  

Slide 36

Slide 36 text

Arnon  Rotem-­‐Gal-­‐Oz     [email protected]     hEp://www.nice.com     hEp://arnon.me/soa-­‐paEerns     [email protected]     hEp://arnon.me     @arnonrgo