$30 off During Our Annual Pro Sale. View Details »

The BIG Opportunity - A peek into Big Data Research

The BIG Opportunity - A peek into Big Data Research

My talk @ Tcrix Faculty Summit'13 about Research opportunities in Big Data and IIIT's journey so far.

Other talks at http://dharmeshkakadia.github.io/talks

dharmeshkakadia

October 01, 2013
Tweet

More Decks by dharmeshkakadia

Other Decks in Research

Transcript

  1. The BIG Opportunity
    A peek into Big Data Research

    View Slide

  2. Whoami
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   2  
    •  MS student @ IIIT-H working under Prof.
    Vasudeva Varma
    •  Just finishing my thesis in Scheduling
    •  Love large scale [systems | data | learning]
    •  Automation freak
    •  Like to work at the intersection of Data and
    System
    •  Want to work on interesting things

    View Slide

  3. Why bother?
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   3  
    “It’s  not  who  has  the  best  algorithm  that  wins.  It’s  who  has  the  most  data.”  
                             -­‐  Banko  and  Brill,  2001  
    Source  :  hBps://amplab.cs.berkeley.edu/2013/02/07/for-­‐big-­‐data-­‐moores-­‐law-­‐means-­‐beBer-­‐decisions  

    View Slide

  4. What is Data ?
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   4  

    View Slide

  5. What is Big Data?
    Anything that is too big or
    too fast or too hard by
    existing tools.
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   5  

    View Slide

  6. What is Big Data?
    •  Anything that is too big, too fast or too hard by
    existing tools
    – 92%  of  world  data  is  generated  in  past  2  
    years  
    – 1.4  Trillion  digital  transecQons  per  month  
    – 30  Billion+  pieces  of  data  added  to  
    Facebook  every  month.  
    – …  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   6  

    View Slide

  7. What is Big Data?
    •  Anything that is too big, too fast or too hard by
    existing tools
    – Think  twiBer  
    – Think  as  display  on  web  
    – Think  stocks  
    – Think  Medical  equipment  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   7  

    View Slide

  8. What is Big Data?
    •  Anything that is too big, too fast or too hard by
    existing tools
    – Jeopardy?  
    – Brain  simulaQons?  
    – And  everything  else  that  we  don’t  know  yet.  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   8  

    View Slide

  9. Why should I care ?
    •  March 2012, The White House announced a
    national "Big Data Initiative”, committing more
    than $200 million to big data research projects
    •  The European Commission is funding a 2-year-
    long Big Data Public Private Forum.
    •  Open Data Initiative by Government of India.
    •  Endless enterprise investments.
    Big Data is here to stay
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   9  

    View Slide

  10. The Big Data Tools Ecosystem
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   10  
    Source  :  hBp://www.bigdata-­‐startups.com/open-­‐source-­‐tools/  

    View Slide

  11. Why is it hard interesting ?
    •  Interdisciplinary, by definition
    •  Requires thinking beyond your comfort
    zone
    – Machine  Learning  
    – StaQsQcs  
    – Systems  
    – VisualizaQon  
    – Signal  Processing  
    – …  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   11  

    View Slide

  12. Why is it hard interesting ?
    •  Interdisciplinary, by definition
    •  Requires thinking beyond your comfort
    zone
    – Machine  Learning  
    – StaQsQcs  
    – Systems  
    – VisualizaQon  
    – Signal  Processing  
    – …  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   12  

    View Slide

  13. Big Data Research @ IIIT-H
    •  Multiple Research Centers involved
    –  Centre  for  Data  Engineering  (CDE)  
    –  Search  and  InformaQon  ExtracQon  Lab  (SIEL)  
    –  Center  for  Visual  InformaQon  Technology  (CVIT)  
    –  Speech  and  Vision  Laboratory  (SVL)  
    –  Center  for  Structural  Engineering  (CASE)  
    –  Language  Technologies  Research  Center  (LTRC)  
    •  Areas of focus in Big Data
    –  Systems  
    –  ApplicaQons  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   13  

    View Slide

  14. Big Data Systems : Data Processing
    frameworks
    •  Improving Processing efficiency
    –  Hadoop  Scheduler  
    –  Hive  query  opQmizaQons  
     
    •  Improving Human efficiency
    –  Automate  everything  
    –  BeBer  VisualizaQon  techniques  
     
    •  How to process new kinds of data ?
    –  Image  
    –  Video  
    –  Speech.  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   14  

    View Slide

  15. Big Data Systems : Cloud
    •  Converged Infrastructure
    – UQlize  full  capabiliQes  of  infrastructure  
    – IntegraQon  of  private  and  public  resources    
     
    •  Resource optimization
    – For  energy,  SLA,  performance  ...  
    – Hot  replicaQon  of  storage  
     
    •  Security & Privacy
    – Privacy  preserving  computaQon  
    – Security  against  theg  
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   15  

    View Slide

  16. Big Data Application : Text Analytics
    •  Entity linking
    •  Summarize
    •  Sarcasm detection
    •  Author profiling
    •  Sentiment analysis
    •  Cross language search
    •  Question answering
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   16  

    View Slide

  17. Big Data Applications in Languages
    •  How do you model languages ?
    •  Auto generation of resources
    •  Part of Speech tagging
    •  Stemming
    •  Morphological analysis
    •  Machine translation
    •  Transfer learning
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   17  

    View Slide

  18. Big Data Applications in Speech
    •  Can we understand what is being said in real
    time ?
    •  Speech synthesis
    •  Emotion Detection in speech
    •  Translate speech from one language to another
    •  “Ok Google” “ठीक $ ग&गल”
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   18  

    View Slide

  19. Big Data Applications in Vision
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   19  
    •  Image  Search  
    •  Cancer  DetecQon  from  scan  
    •  3D  construcQon  from  2D    
    •  Perfect  Group  Photo  ?  

    View Slide

  20. Big Data has lot to offer
    •  Education
    •  Healthcare
    •  Bioscience
    •  Energy
    •  Economics
    •  Defense
    •  Environmental Science
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   20  

    View Slide

  21. Big Data Impact : Education
    •  Intelligent Tutors and Environments
    •  Personalized Learning – Identify student’s
    competencies and knowledge over time,
    understand interests, goals and characteristics
    to improve learning experience.
    •  Education Data mining - Educational data based
    on an individual’s work and behaviors can be
    mined to better understand learning
    achievements, approaches, etc.
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   21  

    View Slide

  22. Big Data Impact : Economics
    •  Decision support governments
    •  Fraud detection
    •  Effectiveness of various government
    initiatives and spending
    •  Helping policy and administrative
    decisions
    •  Finding and correcting Operational
    efficiency issues
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   22  

    View Slide

  23. Big Data Impact : Defense
    •  Smart sensing, perception and decision support
    for autonomous systems
    •  Situational awareness in warfighters
    •  Communication analytics of all forms to prevent
    unwanted events
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   23  

    View Slide

  24. Big Data Impact : Energy
    •  Data analytics to understand Building
    energy consumptions
    •  Grid Analytics
    •  Optimized distribution and generation of
    electric power
    •  Self-healing capabilities to Anticipate and
    respond to system disturbances
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   24  

    View Slide

  25. Big Data Impact : Bioscience and
    Healthcare
    •  Genomics
    •  Personalized Medicine
    •  Data Driven drug discovery
    •  Focus on wellbeing rather than disease
    •  Healthcare preventive, proactive,
    evidence-based, person-centered and,
    •  Treatment personalization
    •  Evaluating Effectiveness of treatments
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   25  

    View Slide

  26. Big Data Impact : Environmental
    Science
    •  Causes and effects of climate change
    •  Land fertility and usage over time
    •  Discovery of natural Resources
    •  Predictive data analysis for disaster
    prevention
    •  Quick response for disaster management
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   26  

    View Slide

  27. Challenges
    •  Lack of Data
    •  Nuggets vs Noise
    •  Talent lag
    •  Data Governance Policy
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   27  

    View Slide

  28. Take away
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   28  

    View Slide

  29. Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   29  
    Thanks  
     
    @dharmeshkakadia  
    [email protected]  

    View Slide

  30. References
    1.  Bertino, Elisa et al. Challenges and
    Opportunities with Big Data. Community
    whitepaper.
    2.  Rajvi Shah et al. All Smiles : Automatic
    Photo Enhancement by Facial Expression
    Analysis. CVMP’12.
    3.  Halevy, A et al. The Unreasonable
    Effectiveness of Data. Intelligent Systems,
    IEEE 2009.
    Sep  20  &  21,  2013   Faculty  Summit  on  Big  Data  ©TCRIX   30  

    View Slide