Slide 1

Slide 1 text

Where  Big  Data  and   Seman0c  Intersect     Valen0n  Zacharias  

Slide 2

Slide 2 text

About  this  talk   •  Goal:  (Hopefully)  leave  you  with  a  few   new  ideas  on  interes;ng  opportuni;es   at  the  intersec;on  of  Big  Data  and   Seman;cs   •  Structure     – Big  Data  &  Analy;cs   – Seman;c  Technologies   – The  intersec;on   – Three  examples  /  opportuni;es  

Slide 3

Slide 3 text

About  Me   •  13  year  experience  in  soIware  driven   innova;on   – seman;c  technologies  researcher  in  the  group  of   Prof  Studer   – manager  of  a  research  division  concerned  with  all   aspects  of  informa;on  driven  decisions  at  FZI   – Big  Data  consultant  with  codecen;c     – (analy;cs  consultant  with  Daimler  TSS)      

Slide 4

Slide 4 text

Big  Data  &   Analy0cs  

Slide 5

Slide 5 text

Data  Deluge:  Moore’s  Law,  Mobile  Web,  Cloud   Compu;ng,  Data  Value  Chain,  Social  Web  etc.  make   collec;ng  large  datasets  ever  cheaper      

Slide 6

Slide 6 text

Big  Data  Technology:  lower  the  cost  to  build  systems   that  do  more  complex  processing  with  more  data   faster  

Slide 7

Slide 7 text

(Big  Data)  Analy0cs:  build  on  this  data  (technologies)   to  harness  paUerns  in  data  to  maximize  business   value  

Slide 8

Slide 8 text

Example:  GPS,  Telemetrics,  handhelds,  mobile  web   etc.  drive  data  deluge  in  postal  services  but  …    

Slide 9

Slide 9 text

..  This  must  be  harnessed  to  realize  same  day   delivery,  preven;ve  maintenance,  precise  arrival   es;mates  …  in  order  to  stay  compe;;ve    

Slide 10

Slide 10 text

Seman0c   Technologies  

Slide 11

Slide 11 text

BeUer  informa;on  processing  through  more   considera;on  for  the  explicit  context  of  processed   elements   Is  label  for   Is  label  for   has  capital   has  popula;on   in  country   contains   Lebanon   Lebanon,   Country   Beirut,   City   4  Million   USA   Dartmouth   Medical  S.   Lebanon,     NH  

Slide 12

Slide 12 text

Seman;c  Technologies  by  Task   •  Discovering  Context:  understand  the  meaning   of  unstructured  data  (text,  images,  …)   •  Moving  Context:  transfer  meaning  of  data   between  systems  (RDF,  RuleML,  …)   •  Using  Context:  inference  based  on  the   meaning  (inference  engines,  deduc;ve   databases)  

Slide 13

Slide 13 text

Seman;c   Big  Data   ?  

Slide 14

Slide 14 text

Seman;c   Big  Data   ?   BigData  Technologies  for  Seman0cs,  e.g.  using  a   hadoop  cluster  for  rule  inferencing  –  not  the  topic   today  

Slide 15

Slide 15 text

Seman;c   Big  Data   ?   Seman0cs  for  Big  Data,  using  seman;c  technologies   to  make  processing  large  amounts  of  data  simpler  

Slide 16

Slide 16 text

Big  Picture  of  Big  Data  Systems   Ingest   Stage   Transform   Serve   Cluster  Management   DFS  /  DDBMS   Resource  Management   Data  Flow  Management  

Slide 17

Slide 17 text

e.g.     Ka_a,  Sqoop  &  Oozie   HBase   MR,    Pig  &  Storm   Hive  +  Cassandra   Ambari  +  ZooKeeper   HDFS   YARN   Apache  Falcon  

Slide 18

Slide 18 text

(some)  seman;cs  relevant  challenges   in  Big  Data  systems   Ingest   Stage   Transform   Serve   Integra;ng  data   from  diverse   sources   Understanding   unstructured  /   polystructured   data   Languages  to   specify   transforma;ons  

Slide 19

Slide 19 text

Moving  Context  with   JSON-­‐LD   Ingest   Stage   Transform   Serve   Integra;ng  data   from  diverse   sources  

Slide 20

Slide 20 text

Observa0on:  Seman;c  technologies  are  successful  in   tackling  the  web  scale  informa;on  integra;on   challenge  

Slide 21

Slide 21 text

Observa0on:  Cudng  edge  enterprise  soIware   architecture  bears  a  striking  resemblance  to  the  web   Source:  Mar;n  Fowler   Microservices,  REST,  ROCA,  Polyglot  Persistence  

Slide 22

Slide 22 text

However,  in  the  enterprise  we  need  to  markup   structured  data  (not  documents)  

Slide 23

Slide 23 text

JSON  as  in  JavaScript  Object  Nota0on???   {! "firstName": "John",! "lastName": "Smith”,"age": 25,! "phoneNumbers": [! {! "type": "home",! "number": "212 555-1234"! },! {! "type": "office",! "number": "646 555-4567"! }! ],! "children": [],!

Slide 24

Slide 24 text

Through  simplicity,  the  prolifera;on  of  JavaScript  and   through  a  good  fit  to  other  data  structures  JSON  has   become  the  standard  for  data  interchange  on  the  web  

Slide 25

Slide 25 text

JSON-­‐LD  allows  to  add  some  seman0cs  to  JSON   {! "@context": {! "name": "http://xmlns.com/foaf/0.1/name",! "homepage": {! "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",! "@type": "@id"! },! "Person": "http://xmlns.com/foaf/0.1/Person"! },! "@id": "http://me.markus-lanthaler.com",! "@type": "Person",! "name": "Markus Lanthaler",! "homepage": "http://www.tugraz.at/"! }!

Slide 26

Slide 26 text

I  believe  the  linked  data  techniques  that  worked   for  web-­‐scale  data  integra;on  can  offer  long   term  relief  for  the  Enterprise  data  integra;on   challenge  (and  that  JSON-­‐LD  can  help  in  doing   this)  

Slide 27

Slide 27 text

Discovering  Seman0cs   in  the  Data  Lake   Ingest   Stage   Transform   Serve   Understanding   unstructured  /   polystructured   data  

Slide 28

Slide 28 text

Mo;vated  by  the  need  for  agility  in  data  use  and  the   availability  of  tools  to  cheaply  manage  giant  amounts  of   polystructured  data  enterprises  are  moving  from  a   tradi;onal  ETL-­‐Data  Warehouse  architecture  …     …  

Slide 29

Slide 29 text

 …  to  an  EL(T)  /  Data  Lake  architecture   …  

Slide 30

Slide 30 text

However,  there  is  currently  a  giant  gap  between   capabili;es  of  companies  to  directly  u;lize  this  heap   of  polystructured  data  …  (e.g.  Elas;c  Search  +  Kibana    

Slide 31

Slide 31 text

…  or  NoveUa)  

Slide 32

Slide 32 text

and  what  has  been  demonstrated  to  be  possible  with   such  heaps  of  polystructured  data  (e.g.  Cogni;ve   Compu;ng  and  IBMs  Watson  or  …  

Slide 33

Slide 33 text

or  Probabilis;c  knowledge  fusion  and  Google   Knowledge  Vault   Dong,  Xin  Luna,  K.  Murphy,  E.  Gabrilovich,  G.   Heitz,  W.  Horn,  N.  Lao,  Thomas  Strohmann,   Shaohua  Sun,  and  Wei  Zhang.  "Knowledge   Vault:  A  Web-­‐scale  approach  to  probabilis;c   knowledge  fusion."  (2014).  

Slide 34

Slide 34 text

or  deep  learning  /  convolu;onal  neural  networks  and   Image  Recogni;on…  

Slide 35

Slide 35 text

I  believe  some  next  genera;on  Big  Data  leaders   will  bring  Seman;cs  (as  in  “discovering  and   using  the  meaning  of  heaps  of  polystructured   data”)  to  many  more  enterprises  

Slide 36

Slide 36 text

LP  for  View  Defini0ons   Ingest   Stage   Transform   Serve   Languages  to   specify   transforma;ons   (e.g.  Cascalog)  

Slide 37

Slide 37 text

Ingest   Stage   Transform   Serve   Linked   Enterprise  Data   (with  JSON-­‐LD)   connect  /  download  slides  at   www.vzach.de   Seman;cs  in  the   Data  Lake   LP  for  view   defini;ons     (e.g.  Cascalog)