Upgrade to Pro — share decks privately, control downloads, hide ads and more …

environmental prediction, rise of the machines

environmental prediction, rise of the machines

NIWA Auckland Seminar, 28 August 2014

Nicolas Fauchereau

August 27, 2014
Tweet

More Decks by Nicolas Fauchereau

Other Decks in Science

Transcript

  1. Introduction Climate Scientist at NIWA Interested into making sense of

    data —> solutions to help people anticipate and adapt to climate variability and climate change [email protected] @NFauchereau https://github.com/nicolasfauchereau/
  2. 1. What is Machine Learning (ML) ?! 2. Some ML

    fundamentals! 3. A brief typology of ML algorithms! 4. ML in industry! 5. ML in the environmental sciences! 6. Development of a ML-based seasonal forecasting scheme for the Pacific Islands ! 7. Lessons learned and conclusions Outline The tools: Python and the scikit-learn ML library https://speakerdeck.com/nicolasf
  3. “[Machine Learning is the] field of study that gives computers

    the ability to learn without being explicitly programmed.” Arthur Samuel, 1959. “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” Tom Mitchell, 1997.
  4. 50-70s: Artificial intelligence, pattern recognition (University) 80-90s: Neural Networks, data-mining

    (applications: science, credit scoring, OCR) mid-90s - today: convergence / explosion (applications: marketing, biotech, retail, telco, govt.) Digital Economy BIG DATA Theoretical Advances (e.g. Deep Learning) Brief history of Machine Learning
  5. how does the learning happen ? • equations with tuneable

    parameters • performance can be measured • optimisation
  6. simplest example: linear regression tuneable parameters errors parameters are tuned

    as to minimise the errors (sum of square differences between observations and predictions)
  7. Applications of Machine Learning • search engines • spam filters

    • recommendations systems • fraud detection • credit rating • targeted marketing • face / object recognition • …
  8. Appointed Yann LeCun and Rob Fergus, from NYU to lead

    new artificial intelligence labs in New York, California and London. Hired Geoff Hinton and Andrew Ng (now joined Baidu)
  9. examples of applications Ecology! • species distribution (current, future …)

    • habitat modelling Climate sciences! • pattern recognition, dimensionality reduction … (PCA, ICA, clustering …) • data mining (Large Ensemble of model outputs) • ‘short-term’ climate forecasting (statistical forecasts)
  10. Prediction of seasonal Mean Sea Level Anomalies in the Pacific

    with: Scott Stephens, Doug Ramsay, Rob Bell (NIWA) John Marra, William Sweet (NOAA) Judith Wells, Rashed Chowduhry (Univ. Hawaii)
  11. coastal inundation = combination of: Extreme sea-levels Mean Sea Level

    Anomaly (MSLA): Climate dependent, ‘slow’ varying • high tides • anomalous Mean Sea Level • weather events (+ wave climate) improved “tide calendar”
  12. Hourly tide gauge records from January 1 1979 through December

    31 2011. Note that all data are in mm. A: INITIAL DATA MSLA seasonal anomalies for 8 stations data for 1981 - 2010 Target ‘raw’ MSLA anomalies (mm) regression discretized (quintiles) classification
  13. Features Seasonal Sea-Surface-Temperatures (SSTs) anomalies the preceding season (e.g. Sep

    - Nov. MSLA predicted using Jun. - Aug SSTs) Dimensionality reduction via EOF (Empirical Orthogonal Functions) and Independent Components Analysis (ICA, Blind Source Separation) 9 Principal Components = 70 %
  14. regression Multiple Linear Regression ! (MLR) Multivariate Adaptative Regression Splines

    ! (MARS) classification Support Vector Machines ! (SVM) Random Forests (RF) … Algorithms
  15. regression Multiple Linear Regression ! (MLR) Multivariate Adaptative Regression Splines

    ! (MARS) classification Support Vector Machines ! (SVM) Random Forests (RF) … Algorithms Neural Networks! (NARX)
  16. regression records from January 1 1979 through December 31 2011.

    data are in mm. DATA Cross-validated MARS for Guam
  17. lessons learned and conclusions • pre-processing, models, features, hyper-parameters, metrics,

    cross-validation procedures, … • testing different models is time-consuming: framework for automation • features choice and feature engineering is key
  18. Environmental prediction: rise of the machines ? • HPCs /

    cloud computing • Satellite remote sensing • In-situ sensors arrays • Model outputs Environmental sciences’s BIG DATA era ? • Mature and stable ML libraries BIG data compute power accessible
  19. Python • dynamic, object- oriented programming language ! • fast

    growing in the ‘data science’ community ! • huge collection of libraries from linear algebra to bayesian analysis, visualisation etc ! • rapid prototyping to production
  20. scikit-learn • State of the art Machine Learning library •

    open-source (and free) • consistent API (Application Programming Interface) • comprehensive documentation • efficient algorithms • harnesses the power of the ‘Python scientific stack’ • very active development • http://scikit-learn.org/stable/index.html