Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SOA & Big data

SOA & Big data

Some notes on combining SOA approach with big data

Arnon Rotem-Gal-Oz

October 17, 2012
Tweet

More Decks by Arnon Rotem-Gal-Oz

Other Decks in Technology

Transcript

  1. •  It  isn’t  just  about   geKng  all  the  data

      there   •  Algorithms  are  cool   but  we  need  humans   in  the  loop   •  Hire  the  right  people   •  Test  !  Test  !  Test!     hEp://theamazingios6maps.tumblr.com/  
  2. Integra>ng  Big  data  &  SOA     Yoel  Ben  Avraham

     -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  
  3. Data     Refinery     Ofer  Berger    

    hEp://www.haifacity.com/allsites/allpic/a/A1738/A1738Pic3326.jpg  
  4. ASB BLT HDL AFT TGI FRY DRW SWG QYD DLY

    BST WIU ASB ZIS XOI CUI RMO DLY XPS KYF KFC WHR JIA GEX FQA VUH HCO WKD ECP SKD MFP WCP DKE AJT   Object  soup  
  5. ASB BLT HDL AFT TGI FRY DRW SWG QYD DLY

    BST WIU ASB ZIS XOI CUI RMO DLY XPS KYF KFC WHR JIA GEX FQA VUH HCO WKD ECP SKD MFP WCP DKE AJT   Services   Invoices Customer Promotions Orders
  6. Service   Describes   Endpoint   Exposes   Messages  

    Sends/receives   Contracts   Binds  to   Service     consumer   Implements   Policy   Governed  by   Sends/receives   Adheres  to   Component   Rela>on   Key   Understands   Serves  
  7. Integra>ng  Big  data  &  SOA     Yoel  Ben  Avraham

     -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  
  8.                    

                                                Coordinator*   Prepare/commit/undo   Service  consumer   Protocol   Rela>on   Key   SOA  component   PaEern  component   Concern/aEribute     RegistraDon   Perform     acDvity   Compensate   Create   context   Ini>ator   Service     Par>cipator   Perform     acDvity   Compensate   Prepare  /   commit  /   undo   Register   AcDviDes  and  replies   AcDviDes  and  replies   Saga  
  9. Hadoop Cluster NIM Interaction Recordings ETL Customer HBase Raw (HDFS)

    Interactions HBase Data Management HBase HCatalog Resolved Interactions(H DFS) Categories HBase HBase Resolved Interactions(H DFS)
  10. Performance  of  joins  in  distributed   system  sucks!   Node

    1 customers A-H Interactions 0-99 Node 2 customers I-M Interactions 100-199 Node 3 customers N-Z Interactions 200-299 {”Interac>on":  {      "id":  ”5",        ”par>cipants":  {          ”customer":  [              {”surname":  ”McDonalds",  ”name":  ”Old"},]      }   }}  
  11. Cell  architecture   Node   2   Node   3

      Node   1   Node   N  
  12.                   Initiate

    business process Workflow engine Endpoint   Workflow instance Invoke services Manage   process   Route request Host   workflows   Schedule   Service                   Endpoint   Service Manage   workflows   Monitor   workflows   Orchestra>on  
  13. Map  Reduce  processing  pipeline   Resolve Customer IDs (Custoemr) Categorize

    Segment (Categorization) Update Segment document (Interaction) Map pipeline Segment Row Retrive segment data - create segment document (Interaction) Write Categories Results (Categorization) Write Interaction (interaction) Customers Local cache InteractionID, Segment Row Map Prepare data mart Export (Datamart) Update Interaction document (Interaction) Reduce pipeline Interaction & Segments Categorize Interaction (Categorization) Write Categories Results (Categorization) Write Interaction (interaction) Reduce Write Interaction (interaction) Hadoop Map/Reduce
  14. Map  Reduce  processing  pipeline   Resolve Customer IDs (Custoemr) Categorize

    Segment (Categorization) Update Segment document (Interaction) Map pipeline Segment Row Retrive segment data - create segment document (Interaction) Write Categories Results (Categorization) Write Interaction (interaction) Customers Local cache InteractionID, Segment Row Map
  15. In-memory Data grid Columnar Graph Indexing NewSQL Columnar Caching HBase

      Hypertable   Neo4j   Apache  Solr   AKvio   IndexTank   RavenDB   Cassandra   MongoDB   CouchDB   ScaleBase   VoltDB   Amazon  RDS   HP  Ver>ca   EMC  Greenplum   IBM  Netezza   Microsoo  PDW   Aster  Data   ParAccel   Memcached   GigaSpaces   Redis   GridGain   Oracle  Coherence   WebSphere  eXtreme  Scale   Pregel   Hama   SAP  HANA   Oracle  Exadata   Accumulo   Document Relational Analytics/MPP Key-value store Distributed file systems Hadoop   GlusterFS  
  16. Datawarehouse   (Hadoop/Hbase)         20  years  

      detailed   aggregated       Datamart(s)   (RDBMS)       6-­‐12  months   Detailed     1-­‐3  years  aggregated   Cube   (MOLAP)       6-­‐?  Months   aggregated   Real-­‐>me   (in  memory)   1-­‐7  days   detailed   Data  is  mul>-­‐>ered  
  17. Data  warehouse   (Hadoop/Hbase)         20  years

        detailed   aggregated       Real-­‐>me     1-­‐7  days   detailed   Datamart(s)   (Columnar)     6-­‐12  months   Detailed     Data  is  mul>-­‐>ered  
  18.                    

                            Subscribed/ pulled data                                           Pull data Data backend Endpoint Out Load Report Ingest   Clean   Join   Transform Transpose   Produce     reports   Report Endpoint Request Raw  data   ODS/DM                           SQL endpoint                           SQL endpoint Landing  area   Service Aggregated  Repor>ng  
  19.  Landing   Raw  data           DW/ODS

      Views   Transforma>on   service   1 1 2 3 4 5 Load   service   2 Report   service  
  20. Report tool Data mart 4 3 Raw data (HDFS) Aggregation

    map/reduce HBase ETL (map/reduce +ETL) Drill through REST API Details Aggregates 1 2 2 5 6 7 8 9 10
  21. Arnon  Rotem-­‐Gal-­‐Oz     [email protected]     hEp://www.nice.com    

    hEp://arnon.me/soa-­‐paEerns     [email protected]     hEp://arnon.me     @arnonrgo