Slide 1

Slide 1 text

The BIG Opportunity A peek into Big Data Research

Slide 2

Slide 2 text

Whoami Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   2   •  MS student @ IIIT-H working under Prof. Vasudeva Varma •  Just finishing my thesis in Scheduling •  Love large scale [systems | data | learning] •  Automation freak •  Like to work at the intersection of Data and System •  Want to work on interesting things

Slide 3

Slide 3 text

Why bother? Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   3   “It’s  not  who  has  the  best  algorithm  that  wins.  It’s  who  has  the  most  data.”                            -­‐  Banko  and  Brill,  2001   Source  :  hBps://amplab.cs.berkeley.edu/2013/02/07/for-­‐big-­‐data-­‐moores-­‐law-­‐means-­‐beBer-­‐decisions  

Slide 4

Slide 4 text

What is Data ? Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   4  

Slide 5

Slide 5 text

What is Big Data? Anything that is too big or too fast or too hard by existing tools. Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   5  

Slide 6

Slide 6 text

What is Big Data? •  Anything that is too big, too fast or too hard by existing tools – 92%  of  world  data  is  generated  in  past  2   years   – 1.4  Trillion  digital  transecQons  per  month   – 30  Billion+  pieces  of  data  added  to   Facebook  every  month.   – …   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   6  

Slide 7

Slide 7 text

What is Big Data? •  Anything that is too big, too fast or too hard by existing tools – Think  twiBer   – Think  as  display  on  web   – Think  stocks   – Think  Medical  equipment   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   7  

Slide 8

Slide 8 text

What is Big Data? •  Anything that is too big, too fast or too hard by existing tools – Jeopardy?   – Brain  simulaQons?   – And  everything  else  that  we  don’t  know  yet.   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   8  

Slide 9

Slide 9 text

Why should I care ? •  March 2012, The White House announced a national "Big Data Initiative”, committing more than $200 million to big data research projects •  The European Commission is funding a 2-year- long Big Data Public Private Forum. •  Open Data Initiative by Government of India. •  Endless enterprise investments. Big Data is here to stay Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   9  

Slide 10

Slide 10 text

The Big Data Tools Ecosystem Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   10   Source  :  hBp://www.bigdata-­‐startups.com/open-­‐source-­‐tools/  

Slide 11

Slide 11 text

Why is it hard interesting ? •  Interdisciplinary, by definition •  Requires thinking beyond your comfort zone – Machine  Learning   – StaQsQcs   – Systems   – VisualizaQon   – Signal  Processing   – …   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   11  

Slide 12

Slide 12 text

Why is it hard interesting ? •  Interdisciplinary, by definition •  Requires thinking beyond your comfort zone – Machine  Learning   – StaQsQcs   – Systems   – VisualizaQon   – Signal  Processing   – …   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   12  

Slide 13

Slide 13 text

Big Data Research @ IIIT-H •  Multiple Research Centers involved –  Centre  for  Data  Engineering  (CDE)   –  Search  and  InformaQon  ExtracQon  Lab  (SIEL)   –  Center  for  Visual  InformaQon  Technology  (CVIT)   –  Speech  and  Vision  Laboratory  (SVL)   –  Center  for  Structural  Engineering  (CASE)   –  Language  Technologies  Research  Center  (LTRC)   •  Areas of focus in Big Data –  Systems   –  ApplicaQons   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   13  

Slide 14

Slide 14 text

Big Data Systems : Data Processing frameworks •  Improving Processing efficiency –  Hadoop  Scheduler   –  Hive  query  opQmizaQons     •  Improving Human efficiency –  Automate  everything   –  BeBer  VisualizaQon  techniques     •  How to process new kinds of data ? –  Image   –  Video   –  Speech.   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   14  

Slide 15

Slide 15 text

Big Data Systems : Cloud •  Converged Infrastructure – UQlize  full  capabiliQes  of  infrastructure   – IntegraQon  of  private  and  public  resources       •  Resource optimization – For  energy,  SLA,  performance  ...   – Hot  replicaQon  of  storage     •  Security & Privacy – Privacy  preserving  computaQon   – Security  against  theg   Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   15  

Slide 16

Slide 16 text

Big Data Application : Text Analytics •  Entity linking •  Summarize •  Sarcasm detection •  Author profiling •  Sentiment analysis •  Cross language search •  Question answering Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   16  

Slide 17

Slide 17 text

Big Data Applications in Languages •  How do you model languages ? •  Auto generation of resources •  Part of Speech tagging •  Stemming •  Morphological analysis •  Machine translation •  Transfer learning Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   17  

Slide 18

Slide 18 text

Big Data Applications in Speech •  Can we understand what is being said in real time ? •  Speech synthesis •  Emotion Detection in speech •  Translate speech from one language to another •  “Ok Google” “ठीक $ ग&गल” Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   18  

Slide 19

Slide 19 text

Big Data Applications in Vision Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   19   •  Image  Search   •  Cancer  DetecQon  from  scan   •  3D  construcQon  from  2D     •  Perfect  Group  Photo  ?  

Slide 20

Slide 20 text

Big Data has lot to offer •  Education •  Healthcare •  Bioscience •  Energy •  Economics •  Defense •  Environmental Science Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   20  

Slide 21

Slide 21 text

Big Data Impact : Education •  Intelligent Tutors and Environments •  Personalized Learning – Identify student’s competencies and knowledge over time, understand interests, goals and characteristics to improve learning experience. •  Education Data mining - Educational data based on an individual’s work and behaviors can be mined to better understand learning achievements, approaches, etc. Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   21  

Slide 22

Slide 22 text

Big Data Impact : Economics •  Decision support governments •  Fraud detection •  Effectiveness of various government initiatives and spending •  Helping policy and administrative decisions •  Finding and correcting Operational efficiency issues Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   22  

Slide 23

Slide 23 text

Big Data Impact : Defense •  Smart sensing, perception and decision support for autonomous systems •  Situational awareness in warfighters •  Communication analytics of all forms to prevent unwanted events Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   23  

Slide 24

Slide 24 text

Big Data Impact : Energy •  Data analytics to understand Building energy consumptions •  Grid Analytics •  Optimized distribution and generation of electric power •  Self-healing capabilities to Anticipate and respond to system disturbances Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   24  

Slide 25

Slide 25 text

Big Data Impact : Bioscience and Healthcare •  Genomics •  Personalized Medicine •  Data Driven drug discovery •  Focus on wellbeing rather than disease •  Healthcare preventive, proactive, evidence-based, person-centered and, •  Treatment personalization •  Evaluating Effectiveness of treatments Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   25  

Slide 26

Slide 26 text

Big Data Impact : Environmental Science •  Causes and effects of climate change •  Land fertility and usage over time •  Discovery of natural Resources •  Predictive data analysis for disaster prevention •  Quick response for disaster management Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   26  

Slide 27

Slide 27 text

Challenges •  Lack of Data •  Nuggets vs Noise •  Talent lag •  Data Governance Policy Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   27  

Slide 28

Slide 28 text

Take away Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   28  

Slide 29

Slide 29 text

Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   29   Thanks     @dharmeshkakadia   [email protected]  

Slide 30

Slide 30 text

References 1.  Bertino, Elisa et al. Challenges and Opportunities with Big Data. Community whitepaper. 2.  Rajvi Shah et al. All Smiles : Automatic Photo Enhancement by Facial Expression Analysis. CVMP’12. 3.  Halevy, A et al. The Unreasonable Effectiveness of Data. Intelligent Systems, IEEE 2009. Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   30