Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where Big Data and Semantic Intersect

Valentin
September 22, 2014

Where Big Data and Semantic Intersect

Big Data, Semantics and the interesting opportunities at the intersection

Valentin

September 22, 2014
Tweet

More Decks by Valentin

Other Decks in Technology

Transcript

  1. Where  Big  Data  and   Seman0c  Intersect     Valen0n

     Zacharias  
  2. About  this  talk   •  Goal:  (Hopefully)  leave  you  with

     a  few   new  ideas  on  interes;ng  opportuni;es   at  the  intersec;on  of  Big  Data  and   Seman;cs   •  Structure     – Big  Data  &  Analy;cs   – Seman;c  Technologies   – The  intersec;on   – Three  examples  /  opportuni;es  
  3. About  Me   •  13  year  experience  in  soIware  driven

      innova;on   – seman;c  technologies  researcher  in  the  group  of   Prof  Studer   – manager  of  a  research  division  concerned  with  all   aspects  of  informa;on  driven  decisions  at  FZI   – Big  Data  consultant  with  codecen;c     – (analy;cs  consultant  with  Daimler  TSS)      
  4. Big  Data  &   Analy0cs  

  5. Data  Deluge:  Moore’s  Law,  Mobile  Web,  Cloud   Compu;ng,  Data

     Value  Chain,  Social  Web  etc.  make   collec;ng  large  datasets  ever  cheaper      
  6. Big  Data  Technology:  lower  the  cost  to  build  systems  

    that  do  more  complex  processing  with  more  data   faster  
  7. (Big  Data)  Analy0cs:  build  on  this  data  (technologies)   to

     harness  paUerns  in  data  to  maximize  business   value  
  8. Example:  GPS,  Telemetrics,  handhelds,  mobile  web   etc.  drive  data

     deluge  in  postal  services  but  …    
  9. ..  This  must  be  harnessed  to  realize  same  day  

    delivery,  preven;ve  maintenance,  precise  arrival   es;mates  …  in  order  to  stay  compe;;ve    
  10. Seman0c   Technologies  

  11. BeUer  informa;on  processing  through  more   considera;on  for  the  explicit

     context  of  processed   elements   Is  label  for   Is  label  for   has  capital   has  popula;on   in  country   contains   Lebanon   Lebanon,   Country   Beirut,   City   4  Million   USA   Dartmouth   Medical  S.   Lebanon,     NH  
  12. Seman;c  Technologies  by  Task   •  Discovering  Context:  understand  the

     meaning   of  unstructured  data  (text,  images,  …)   •  Moving  Context:  transfer  meaning  of  data   between  systems  (RDF,  RuleML,  …)   •  Using  Context:  inference  based  on  the   meaning  (inference  engines,  deduc;ve   databases)  
  13. Seman;c   Big  Data   ?  

  14. Seman;c   Big  Data   ?   BigData  Technologies  for

     Seman0cs,  e.g.  using  a   hadoop  cluster  for  rule  inferencing  –  not  the  topic   today  
  15. Seman;c   Big  Data   ?   Seman0cs  for  Big

     Data,  using  seman;c  technologies   to  make  processing  large  amounts  of  data  simpler  
  16. Big  Picture  of  Big  Data  Systems   Ingest   Stage

      Transform   Serve   Cluster  Management   DFS  /  DDBMS   Resource  Management   Data  Flow  Management  
  17. e.g.     Ka_a,  Sqoop  &  Oozie   HBase  

    MR,    Pig  &  Storm   Hive  +  Cassandra   Ambari  +  ZooKeeper   HDFS   YARN   Apache  Falcon  
  18. (some)  seman;cs  relevant  challenges   in  Big  Data  systems  

    Ingest   Stage   Transform   Serve   Integra;ng  data   from  diverse   sources   Understanding   unstructured  /   polystructured   data   Languages  to   specify   transforma;ons  
  19. Moving  Context  with   JSON-­‐LD   Ingest   Stage  

    Transform   Serve   Integra;ng  data   from  diverse   sources  
  20. Observa0on:  Seman;c  technologies  are  successful  in   tackling  the  web

     scale  informa;on  integra;on   challenge  
  21. Observa0on:  Cudng  edge  enterprise  soIware   architecture  bears  a  striking

     resemblance  to  the  web   Source:  Mar;n  Fowler   Microservices,  REST,  ROCA,  Polyglot  Persistence  
  22. However,  in  the  enterprise  we  need  to  markup   structured

     data  (not  documents)  
  23. JSON  as  in  JavaScript  Object  Nota0on???   {! "firstName": "John",!

    "lastName": "Smith”,"age": 25,! "phoneNumbers": [! {! "type": "home",! "number": "212 555-1234"! },! {! "type": "office",! "number": "646 555-4567"! }! ],! "children": [],!
  24. Through  simplicity,  the  prolifera;on  of  JavaScript  and   through  a

     good  fit  to  other  data  structures  JSON  has   become  the  standard  for  data  interchange  on  the  web  
  25. JSON-­‐LD  allows  to  add  some  seman0cs  to  JSON   {!

    "@context": {! "name": "http://xmlns.com/foaf/0.1/name",! "homepage": {! "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",! "@type": "@id"! },! "Person": "http://xmlns.com/foaf/0.1/Person"! },! "@id": "http://me.markus-lanthaler.com",! "@type": "Person",! "name": "Markus Lanthaler",! "homepage": "http://www.tugraz.at/"! }!
  26. I  believe  the  linked  data  techniques  that  worked   for

     web-­‐scale  data  integra;on  can  offer  long   term  relief  for  the  Enterprise  data  integra;on   challenge  (and  that  JSON-­‐LD  can  help  in  doing   this)  
  27. Discovering  Seman0cs   in  the  Data  Lake   Ingest  

    Stage   Transform   Serve   Understanding   unstructured  /   polystructured   data  
  28. Mo;vated  by  the  need  for  agility  in  data  use  and

     the   availability  of  tools  to  cheaply  manage  giant  amounts  of   polystructured  data  enterprises  are  moving  from  a   tradi;onal  ETL-­‐Data  Warehouse  architecture  …     …  
  29.  …  to  an  EL(T)  /  Data  Lake  architecture   …

     
  30. However,  there  is  currently  a  giant  gap  between   capabili;es

     of  companies  to  directly  u;lize  this  heap   of  polystructured  data  …  (e.g.  Elas;c  Search  +  Kibana    
  31. …  or  NoveUa)  

  32. and  what  has  been  demonstrated  to  be  possible  with  

    such  heaps  of  polystructured  data  (e.g.  Cogni;ve   Compu;ng  and  IBMs  Watson  or  …  
  33. or  Probabilis;c  knowledge  fusion  and  Google   Knowledge  Vault  

    Dong,  Xin  Luna,  K.  Murphy,  E.  Gabrilovich,  G.   Heitz,  W.  Horn,  N.  Lao,  Thomas  Strohmann,   Shaohua  Sun,  and  Wei  Zhang.  "Knowledge   Vault:  A  Web-­‐scale  approach  to  probabilis;c   knowledge  fusion."  (2014).  
  34. or  deep  learning  /  convolu;onal  neural  networks  and   Image

     Recogni;on…  
  35. I  believe  some  next  genera;on  Big  Data  leaders   will

     bring  Seman;cs  (as  in  “discovering  and   using  the  meaning  of  heaps  of  polystructured   data”)  to  many  more  enterprises  
  36. LP  for  View  Defini0ons   Ingest   Stage   Transform

      Serve   Languages  to   specify   transforma;ons   (e.g.  Cascalog)  
  37. Ingest   Stage   Transform   Serve   Linked  

    Enterprise  Data   (with  JSON-­‐LD)   connect  /  download  slides  at   www.vzach.de   Seman;cs  in  the   Data  Lake   LP  for  view   defini;ons     (e.g.  Cascalog)