Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where Big Data and Semantic Intersect

Valentin
September 22, 2014

Where Big Data and Semantic Intersect

Big Data, Semantics and the interesting opportunities at the intersection

Valentin

September 22, 2014
Tweet

More Decks by Valentin

Other Decks in Technology

Transcript

  1. Where  Big  Data  and  
    Seman0c  Intersect  
     
    Valen0n  Zacharias  

    View Slide

  2. About  this  talk  
    •  Goal:  (Hopefully)  leave  you  with  a  few  
    new  ideas  on  interes;ng  opportuni;es  
    at  the  intersec;on  of  Big  Data  and  
    Seman;cs  
    •  Structure    
    – Big  Data  &  Analy;cs  
    – Seman;c  Technologies  
    – The  intersec;on  
    – Three  examples  /  opportuni;es  

    View Slide

  3. About  Me  
    •  13  year  experience  in  soIware  driven  
    innova;on  
    – seman;c  technologies  researcher  in  the  group  of  
    Prof  Studer  
    – manager  of  a  research  division  concerned  with  all  
    aspects  of  informa;on  driven  decisions  at  FZI  
    – Big  Data  consultant  with  codecen;c    
    – (analy;cs  consultant  with  Daimler  TSS)      

    View Slide

  4. Big  Data  &  
    Analy0cs  

    View Slide

  5. Data  Deluge:  Moore’s  Law,  Mobile  Web,  Cloud  
    Compu;ng,  Data  Value  Chain,  Social  Web  etc.  make  
    collec;ng  large  datasets  ever  cheaper      

    View Slide

  6. Big  Data  Technology:  lower  the  cost  to  build  systems  
    that  do  more  complex  processing  with  more  data  
    faster  

    View Slide

  7. (Big  Data)  Analy0cs:  build  on  this  data  (technologies)  
    to  harness  paUerns  in  data  to  maximize  business  
    value  

    View Slide

  8. Example:  GPS,  Telemetrics,  handhelds,  mobile  web  
    etc.  drive  data  deluge  in  postal  services  but  …    

    View Slide

  9. ..  This  must  be  harnessed  to  realize  same  day  
    delivery,  preven;ve  maintenance,  precise  arrival  
    es;mates  …  in  order  to  stay  compe;;ve    

    View Slide

  10. Seman0c  
    Technologies  

    View Slide

  11. BeUer  informa;on  processing  through  more  
    considera;on  for  the  explicit  context  of  processed  
    elements  
    Is  label  for  
    Is  label  for  
    has  capital  
    has  popula;on  
    in  country  
    contains  
    Lebanon  
    Lebanon,  
    Country  
    Beirut,  
    City  
    4  Million  
    USA  
    Dartmouth  
    Medical  S.  
    Lebanon,    
    NH  

    View Slide

  12. Seman;c  Technologies  by  Task  
    •  Discovering  Context:  understand  the  meaning  
    of  unstructured  data  (text,  images,  …)  
    •  Moving  Context:  transfer  meaning  of  data  
    between  systems  (RDF,  RuleML,  …)  
    •  Using  Context:  inference  based  on  the  
    meaning  (inference  engines,  deduc;ve  
    databases)  

    View Slide

  13. Seman;c   Big  Data  
    ?  

    View Slide

  14. Seman;c   Big  Data  
    ?  
    BigData  Technologies  for  Seman0cs,  e.g.  using  a  
    hadoop  cluster  for  rule  inferencing  –  not  the  topic  
    today  

    View Slide

  15. Seman;c   Big  Data  
    ?  
    Seman0cs  for  Big  Data,  using  seman;c  technologies  
    to  make  processing  large  amounts  of  data  simpler  

    View Slide

  16. Big  Picture  of  Big  Data  Systems  
    Ingest  
    Stage  
    Transform  
    Serve  
    Cluster  Management  
    DFS  /  DDBMS  
    Resource  Management  
    Data  Flow  Management  

    View Slide

  17. e.g.    
    Ka_a,  Sqoop  &  Oozie  
    HBase  
    MR,    Pig  &  Storm  
    Hive  +  Cassandra  
    Ambari  +  ZooKeeper  
    HDFS  
    YARN  
    Apache  Falcon  

    View Slide

  18. (some)  seman;cs  relevant  challenges  
    in  Big  Data  systems  
    Ingest  
    Stage  
    Transform  
    Serve  
    Integra;ng  data  
    from  diverse  
    sources   Understanding  
    unstructured  /  
    polystructured  
    data  
    Languages  to  
    specify  
    transforma;ons  

    View Slide

  19. Moving  Context  with  
    JSON-­‐LD  
    Ingest  
    Stage  
    Transform  
    Serve  
    Integra;ng  data  
    from  diverse  
    sources  

    View Slide

  20. Observa0on:  Seman;c  technologies  are  successful  in  
    tackling  the  web  scale  informa;on  integra;on  
    challenge  

    View Slide

  21. Observa0on:  Cudng  edge  enterprise  soIware  
    architecture  bears  a  striking  resemblance  to  the  web  
    Source:  Mar;n  Fowler  
    Microservices,  REST,  ROCA,  Polyglot  Persistence  

    View Slide

  22. However,  in  the  enterprise  we  need  to  markup  
    structured  data  (not  documents)  

    View Slide

  23. JSON  as  in  JavaScript  Object  Nota0on???  
    {!
    "firstName": "John",!
    "lastName": "Smith”,"age": 25,!
    "phoneNumbers": [!
    {!
    "type": "home",!
    "number": "212 555-1234"!
    },!
    {!
    "type": "office",!
    "number": "646 555-4567"!
    }!
    ],!
    "children": [],!

    View Slide

  24. Through  simplicity,  the  prolifera;on  of  JavaScript  and  
    through  a  good  fit  to  other  data  structures  JSON  has  
    become  the  standard  for  data  interchange  on  the  web  

    View Slide

  25. JSON-­‐LD  allows  to  add  some  seman0cs  to  JSON  
    {!
    "@context": {!
    "name": "http://xmlns.com/foaf/0.1/name",!
    "homepage": {!
    "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",!
    "@type": "@id"!
    },!
    "Person": "http://xmlns.com/foaf/0.1/Person"!
    },!
    "@id": "http://me.markus-lanthaler.com",!
    "@type": "Person",!
    "name": "Markus Lanthaler",!
    "homepage": "http://www.tugraz.at/"!
    }!

    View Slide

  26. I  believe  the  linked  data  techniques  that  worked  
    for  web-­‐scale  data  integra;on  can  offer  long  
    term  relief  for  the  Enterprise  data  integra;on  
    challenge  (and  that  JSON-­‐LD  can  help  in  doing  
    this)  

    View Slide

  27. Discovering  Seman0cs  
    in  the  Data  Lake  
    Ingest  
    Stage  
    Transform  
    Serve  
    Understanding  
    unstructured  /  
    polystructured  
    data  

    View Slide

  28. Mo;vated  by  the  need  for  agility  in  data  use  and  the  
    availability  of  tools  to  cheaply  manage  giant  amounts  of  
    polystructured  data  enterprises  are  moving  from  a  
    tradi;onal  ETL-­‐Data  Warehouse  architecture  …    
    …  

    View Slide

  29.  …  to  an  EL(T)  /  Data  Lake  architecture  
    …  

    View Slide

  30. However,  there  is  currently  a  giant  gap  between  
    capabili;es  of  companies  to  directly  u;lize  this  heap  
    of  polystructured  data  …  (e.g.  Elas;c  Search  +  Kibana    

    View Slide

  31. …  or  NoveUa)  

    View Slide

  32. and  what  has  been  demonstrated  to  be  possible  with  
    such  heaps  of  polystructured  data  (e.g.  Cogni;ve  
    Compu;ng  and  IBMs  Watson  or  …  

    View Slide

  33. or  Probabilis;c  knowledge  fusion  and  Google  
    Knowledge  Vault  
    Dong,  Xin  Luna,  K.  Murphy,  E.  Gabrilovich,  G.  
    Heitz,  W.  Horn,  N.  Lao,  Thomas  Strohmann,  
    Shaohua  Sun,  and  Wei  Zhang.  "Knowledge  
    Vault:  A  Web-­‐scale  approach  to  probabilis;c  
    knowledge  fusion."  (2014).  

    View Slide

  34. or  deep  learning  /  convolu;onal  neural  networks  and  
    Image  Recogni;on…  

    View Slide

  35. I  believe  some  next  genera;on  Big  Data  leaders  
    will  bring  Seman;cs  (as  in  “discovering  and  
    using  the  meaning  of  heaps  of  polystructured  
    data”)  to  many  more  enterprises  

    View Slide

  36. LP  for  View  Defini0ons  
    Ingest  
    Stage  
    Transform  
    Serve  
    Languages  to  
    specify  
    transforma;ons  
    (e.g.  Cascalog)  

    View Slide

  37. Ingest  
    Stage  
    Transform  
    Serve  
    Linked  
    Enterprise  Data  
    (with  JSON-­‐LD)  
    connect  /  download  slides  at  
    www.vzach.de  
    Seman;cs  in  the  
    Data  Lake  
    LP  for  view  
    defini;ons    
    (e.g.  Cascalog)  

    View Slide