Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[panel3]陳仲民博士

MC2013
August 28, 2013
110

 [panel3]陳仲民博士

MC2013

August 28, 2013
Tweet

Transcript

  1. Big Data – how it changes the way you treat

    data Chung-Min Chen, Chief Scientist Information Analysis Research and Services Applied Communication Sciences [email protected]
  2. Hope or Hype? • Big data will change – The

    way you live – The way you work – The way you think • Big data is…Big Bubble – remember .com? Snowden, Obama, Nate Silver, Netflix, …
  3. 1. 數大是美 more is beauty • Quantity – Quantity change

    leads to quality change • N = All – Scrutiny leads to discovery – Sampling shortfalls: “random” is hard, lacks details, missing targets • Passive – Passiveness leads to fidelity – sampling + questionnaire  big data + analysis
  4. 2. 不拘小節 inaccuracy • Data is intrinsically – Uncertain –

    Inconsistent – Imperfect • Old thinking – Impute missing data – Reject messy data • New thinking – Trade accuracy for comprehension – Macro vs. micro – Probabilistic vs. SQL
  5. 3.知其然 而不知其所以然 • Correlation prevails Causality – ER admission –

    pre-mature newborn • Mechanical causality – Bayesian network • Be careful not to ignore causality for all – Retail store sales – Car defects
  6. Don’t Trust Your Doctor • ER Crisis at Cook Cnty.

    Hospital, 1996 – Flooded with chest pain patients • Who should be admitted (i.e. having real heart attack)? • Standard manual procedure – BP, stethoscope, questions, ECG – >90% admitted are false positive, 83% recall “Blink: the power of thinking without thinking”, M. Gladwell.
  7. Decision Tree • Three features – Unstable angina pain? –

    Fluid in lung? – Systolic BP < 100? • Results – False positives < %30 (vs. >90% by doctors) – Recall > 95% (vs. 83% by doctors) Goldman L, Cook EF, Brand DA et al. A computer protocol to predict myocardial infarction in emergency department patients with chest pain. N Engl J Med 1988; 318 (13):797-803
  8. 3.知其然 而不知其所以然 • Correlation prevails Causality – ER admission –

    pre-mature newborn • Mechanical causality – Bayesian network • Be careful not to ignore causality for all – Retail store sales – Car defects
  9. Technologies Hadoop (HDFS, MapReduce, NoSQL) Stream processing Data curation at

    scale Probability databases Analytics (OLAP, data mining, ML, statistics, math) Volume Velocity Variety Veracity Value
  10. Trends & Issues • Applications prevails technologies • Data prevails

    experts (專家末日) • Data crunchers prevails scientists/engineers • Big Data Divide • Privacy