Where Big Data and Semantic Intersect

B911d14451f50b883b4c4a122226b7f4?s=47 Valentin
September 22, 2014

Where Big Data and Semantic Intersect

Big Data, Semantics and the interesting opportunities at the intersection

B911d14451f50b883b4c4a122226b7f4?s=128

Valentin

September 22, 2014
Tweet

Transcript

  1. Where  Big  Data  and   Seman0c  Intersect     Valen0n

     Zacharias  
  2. About  this  talk   •  Goal:  (Hopefully)  leave  you  with

     a  few   new  ideas  on  interes;ng  opportuni;es   at  the  intersec;on  of  Big  Data  and   Seman;cs   •  Structure     – Big  Data  &  Analy;cs   – Seman;c  Technologies   – The  intersec;on   – Three  examples  /  opportuni;es  
  3. About  Me   •  13  year  experience  in  soIware  driven

      innova;on   – seman;c  technologies  researcher  in  the  group  of   Prof  Studer   – manager  of  a  research  division  concerned  with  all   aspects  of  informa;on  driven  decisions  at  FZI   – Big  Data  consultant  with  codecen;c     – (analy;cs  consultant  with  Daimler  TSS)      
  4. Big  Data  &   Analy0cs  

  5. Data  Deluge:  Moore’s  Law,  Mobile  Web,  Cloud   Compu;ng,  Data

     Value  Chain,  Social  Web  etc.  make   collec;ng  large  datasets  ever  cheaper      
  6. Big  Data  Technology:  lower  the  cost  to  build  systems  

    that  do  more  complex  processing  with  more  data   faster  
  7. (Big  Data)  Analy0cs:  build  on  this  data  (technologies)   to

     harness  paUerns  in  data  to  maximize  business   value  
  8. Example:  GPS,  Telemetrics,  handhelds,  mobile  web   etc.  drive  data

     deluge  in  postal  services  but  …    
  9. ..  This  must  be  harnessed  to  realize  same  day  

    delivery,  preven;ve  maintenance,  precise  arrival   es;mates  …  in  order  to  stay  compe;;ve    
  10. Seman0c   Technologies  

  11. BeUer  informa;on  processing  through  more   considera;on  for  the  explicit

     context  of  processed   elements   Is  label  for   Is  label  for   has  capital   has  popula;on   in  country   contains   Lebanon   Lebanon,   Country   Beirut,   City   4  Million   USA   Dartmouth   Medical  S.   Lebanon,     NH  
  12. Seman;c  Technologies  by  Task   •  Discovering  Context:  understand  the

     meaning   of  unstructured  data  (text,  images,  …)   •  Moving  Context:  transfer  meaning  of  data   between  systems  (RDF,  RuleML,  …)   •  Using  Context:  inference  based  on  the   meaning  (inference  engines,  deduc;ve   databases)  
  13. Seman;c   Big  Data   ?  

  14. Seman;c   Big  Data   ?   BigData  Technologies  for

     Seman0cs,  e.g.  using  a   hadoop  cluster  for  rule  inferencing  –  not  the  topic   today  
  15. Seman;c   Big  Data   ?   Seman0cs  for  Big

     Data,  using  seman;c  technologies   to  make  processing  large  amounts  of  data  simpler  
  16. Big  Picture  of  Big  Data  Systems   Ingest   Stage

      Transform   Serve   Cluster  Management   DFS  /  DDBMS   Resource  Management   Data  Flow  Management  
  17. e.g.     Ka_a,  Sqoop  &  Oozie   HBase  

    MR,    Pig  &  Storm   Hive  +  Cassandra   Ambari  +  ZooKeeper   HDFS   YARN   Apache  Falcon  
  18. (some)  seman;cs  relevant  challenges   in  Big  Data  systems  

    Ingest   Stage   Transform   Serve   Integra;ng  data   from  diverse   sources   Understanding   unstructured  /   polystructured   data   Languages  to   specify   transforma;ons  
  19. Moving  Context  with   JSON-­‐LD   Ingest   Stage  

    Transform   Serve   Integra;ng  data   from  diverse   sources  
  20. Observa0on:  Seman;c  technologies  are  successful  in   tackling  the  web

     scale  informa;on  integra;on   challenge  
  21. Observa0on:  Cudng  edge  enterprise  soIware   architecture  bears  a  striking

     resemblance  to  the  web   Source:  Mar;n  Fowler   Microservices,  REST,  ROCA,  Polyglot  Persistence  
  22. However,  in  the  enterprise  we  need  to  markup   structured

     data  (not  documents)  
  23. JSON  as  in  JavaScript  Object  Nota0on???   {! "firstName": "John",!

    "lastName": "Smith”,"age": 25,! "phoneNumbers": [! {! "type": "home",! "number": "212 555-1234"! },! {! "type": "office",! "number": "646 555-4567"! }! ],! "children": [],!
  24. Through  simplicity,  the  prolifera;on  of  JavaScript  and   through  a

     good  fit  to  other  data  structures  JSON  has   become  the  standard  for  data  interchange  on  the  web  
  25. JSON-­‐LD  allows  to  add  some  seman0cs  to  JSON   {!

    "@context": {! "name": "http://xmlns.com/foaf/0.1/name",! "homepage": {! "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",! "@type": "@id"! },! "Person": "http://xmlns.com/foaf/0.1/Person"! },! "@id": "http://me.markus-lanthaler.com",! "@type": "Person",! "name": "Markus Lanthaler",! "homepage": "http://www.tugraz.at/"! }!
  26. I  believe  the  linked  data  techniques  that  worked   for

     web-­‐scale  data  integra;on  can  offer  long   term  relief  for  the  Enterprise  data  integra;on   challenge  (and  that  JSON-­‐LD  can  help  in  doing   this)  
  27. Discovering  Seman0cs   in  the  Data  Lake   Ingest  

    Stage   Transform   Serve   Understanding   unstructured  /   polystructured   data  
  28. Mo;vated  by  the  need  for  agility  in  data  use  and

     the   availability  of  tools  to  cheaply  manage  giant  amounts  of   polystructured  data  enterprises  are  moving  from  a   tradi;onal  ETL-­‐Data  Warehouse  architecture  …     …  
  29.  …  to  an  EL(T)  /  Data  Lake  architecture   …

     
  30. However,  there  is  currently  a  giant  gap  between   capabili;es

     of  companies  to  directly  u;lize  this  heap   of  polystructured  data  …  (e.g.  Elas;c  Search  +  Kibana    
  31. …  or  NoveUa)  

  32. and  what  has  been  demonstrated  to  be  possible  with  

    such  heaps  of  polystructured  data  (e.g.  Cogni;ve   Compu;ng  and  IBMs  Watson  or  …  
  33. or  Probabilis;c  knowledge  fusion  and  Google   Knowledge  Vault  

    Dong,  Xin  Luna,  K.  Murphy,  E.  Gabrilovich,  G.   Heitz,  W.  Horn,  N.  Lao,  Thomas  Strohmann,   Shaohua  Sun,  and  Wei  Zhang.  "Knowledge   Vault:  A  Web-­‐scale  approach  to  probabilis;c   knowledge  fusion."  (2014).  
  34. or  deep  learning  /  convolu;onal  neural  networks  and   Image

     Recogni;on…  
  35. I  believe  some  next  genera;on  Big  Data  leaders   will

     bring  Seman;cs  (as  in  “discovering  and   using  the  meaning  of  heaps  of  polystructured   data”)  to  many  more  enterprises  
  36. LP  for  View  Defini0ons   Ingest   Stage   Transform

      Serve   Languages  to   specify   transforma;ons   (e.g.  Cascalog)  
  37. Ingest   Stage   Transform   Serve   Linked  

    Enterprise  Data   (with  JSON-­‐LD)   connect  /  download  slides  at   www.vzach.de   Seman;cs  in  the   Data  Lake   LP  for  view   defini;ons     (e.g.  Cascalog)