My talk @ Tcrix Faculty Summit'13 about Research opportunities in Big Data and IIIT's journey so far.
Other talks at http://dharmeshkakadia.github.io/talks
The BIG OpportunityA peek into Big Data Research
View Slide
WhoamiSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 2 • MS student @ IIIT-H working under Prof.Vasudeva Varma• Just finishing my thesis in Scheduling• Love large scale [systems | data | learning]• Automation freak• Like to work at the intersection of Data andSystem• Want to work on interesting things
Why bother?Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 3 “It’s not who has the best algorithm that wins. It’s who has the most data.” -‐ Banko and Brill, 2001 Source : hBps://amplab.cs.berkeley.edu/2013/02/07/for-‐big-‐data-‐moores-‐law-‐means-‐beBer-‐decisions
What is Data ?Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 4
What is Big Data?Anything that is too big ortoo fast or too hard byexisting tools.Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 5
What is Big Data?• Anything that is too big, too fast or too hard byexisting tools– 92% of world data is generated in past 2 years – 1.4 Trillion digital transecQons per month – 30 Billion+ pieces of data added to Facebook every month. – … Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 6
What is Big Data?• Anything that is too big, too fast or too hard byexisting tools– Think twiBer – Think as display on web – Think stocks – Think Medical equipment Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 7
What is Big Data?• Anything that is too big, too fast or too hard byexisting tools– Jeopardy? – Brain simulaQons? – And everything else that we don’t know yet. Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 8
Why should I care ?• March 2012, The White House announced anational "Big Data Initiative”, committing morethan $200 million to big data research projects• The European Commission is funding a 2-year-long Big Data Public Private Forum.• Open Data Initiative by Government of India.• Endless enterprise investments.Big Data is here to staySep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 9
The Big Data Tools EcosystemSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 10 Source : hBp://www.bigdata-‐startups.com/open-‐source-‐tools/
Why is it hard interesting ?• Interdisciplinary, by definition• Requires thinking beyond your comfortzone– Machine Learning – StaQsQcs – Systems – VisualizaQon – Signal Processing – … Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 11
Why is it hard interesting ?• Interdisciplinary, by definition• Requires thinking beyond your comfortzone– Machine Learning – StaQsQcs – Systems – VisualizaQon – Signal Processing – … Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 12
Big Data Research @ IIIT-H• Multiple Research Centers involved– Centre for Data Engineering (CDE) – Search and InformaQon ExtracQon Lab (SIEL) – Center for Visual InformaQon Technology (CVIT) – Speech and Vision Laboratory (SVL) – Center for Structural Engineering (CASE) – Language Technologies Research Center (LTRC) • Areas of focus in Big Data– Systems – ApplicaQons Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 13
Big Data Systems : Data Processingframeworks• Improving Processing efficiency– Hadoop Scheduler – Hive query opQmizaQons • Improving Human efficiency– Automate everything – BeBer VisualizaQon techniques • How to process new kinds of data ?– Image – Video – Speech. Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 14
Big Data Systems : Cloud• Converged Infrastructure– UQlize full capabiliQes of infrastructure – IntegraQon of private and public resources • Resource optimization– For energy, SLA, performance ... – Hot replicaQon of storage • Security & Privacy– Privacy preserving computaQon – Security against theg Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 15
Big Data Application : Text Analytics• Entity linking• Summarize• Sarcasm detection• Author profiling• Sentiment analysis• Cross language search• Question answeringSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 16
Big Data Applications in Languages• How do you model languages ?• Auto generation of resources• Part of Speech tagging• Stemming• Morphological analysis• Machine translation• Transfer learningSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 17
Big Data Applications in Speech• Can we understand what is being said in realtime ?• Speech synthesis• Emotion Detection in speech• Translate speech from one language to another• “Ok Google” “ठीक $ ग&गल”Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 18
Big Data Applications in VisionSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 19 • Image Search • Cancer DetecQon from scan • 3D construcQon from 2D • Perfect Group Photo ?
Big Data has lot to offer• Education• Healthcare• Bioscience• Energy• Economics• Defense• Environmental ScienceSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 20
Big Data Impact : Education• Intelligent Tutors and Environments• Personalized Learning – Identify student’scompetencies and knowledge over time,understand interests, goals and characteristicsto improve learning experience.• Education Data mining - Educational data basedon an individual’s work and behaviors can bemined to better understand learningachievements, approaches, etc.Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 21
Big Data Impact : Economics• Decision support governments• Fraud detection• Effectiveness of various governmentinitiatives and spending• Helping policy and administrativedecisions• Finding and correcting Operationalefficiency issuesSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 22
Big Data Impact : Defense• Smart sensing, perception and decision supportfor autonomous systems• Situational awareness in warfighters• Communication analytics of all forms to preventunwanted eventsSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 23
Big Data Impact : Energy• Data analytics to understand Buildingenergy consumptions• Grid Analytics• Optimized distribution and generation ofelectric power• Self-healing capabilities to Anticipate andrespond to system disturbancesSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 24
Big Data Impact : Bioscience andHealthcare• Genomics• Personalized Medicine• Data Driven drug discovery• Focus on wellbeing rather than disease• Healthcare preventive, proactive,evidence-based, person-centered and,• Treatment personalization• Evaluating Effectiveness of treatmentsSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 25
Big Data Impact : EnvironmentalScience• Causes and effects of climate change• Land fertility and usage over time• Discovery of natural Resources• Predictive data analysis for disasterprevention• Quick response for disaster managementSep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 26
Challenges• Lack of Data• Nuggets vs Noise• Talent lag• Data Governance PolicySep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 27
Take awaySep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 28
Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 29 Thanks @dharmeshkakadia [email protected]
References1. Bertino, Elisa et al. Challenges andOpportunities with Big Data. Communitywhitepaper.2. Rajvi Shah et al. All Smiles : AutomaticPhoto Enhancement by Facial ExpressionAnalysis. CVMP’12.3. Halevy, A et al. The UnreasonableEffectiveness of Data. Intelligent Systems,IEEE 2009.Sep 20 & 21, 2013 Faculty Summit on Big Data ©TCRIX 30