Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Verifying the Forecast: How Climate Models are Developed and Tested

Verifying the Forecast: How Climate Models are Developed and Tested

Keynote Talk given at the ESEC/FSE-2017, the 11th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.
Paderborn, Germany, Sept 7, 2017

Steve Easterbrook

September 07, 2017
Tweet

More Decks by Steve Easterbrook

Other Decks in Research

Transcript

  1. Verifying  the  Forecast:  
    How  Climate  Models  are  Developed  and  Tested
    Steve  Easterbrook
    Email: [email protected]
    Blog: www.easterbrook.ca/steve
    Twitter: @SMEasterbrook

    View Slide

  2. 2
    A  complex  software  eco-­system…
    Alexander,  K.,  Easterbrook,  S.  (2015).  The  software  architecture  of  climate  models:  a  graphical  comparison  of  
    CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  

    View Slide

  3. 3
    Example:  NCAR,  Boulder
    Alexander,  K.,  Easterbrook,  S.  (2015).  The  software  architecture  of  climate  models:  a  graphical  comparison  of  
    CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  

    View Slide

  4. 4
    Outline
    1. What  are  climate  models?
    In  which  we  meet  a  19th  Century  Swedish  chemist  and  a  famous  computer  
    scientist,  and  find  out  if  butterflies  cause  hurricanes
    2. What  is  their  purpose?
    In  which  we  perform  several  dangerous  experiments  on  the  life  support  
    systems  of  planet  Earth,  and  live  to  tell  the  tale
    3. What  Software  Engineering  practices  are  used?
    In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”  
    helps  keep  the  snake  oil  salesmen  away…
    4. Are  they  fit  for  purpose?
    In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and  
    encounter  two  remarkably  effective  V&V  practices.

    View Slide

  5. 5
    The  First  Computational  Climate  Model
    1895:  Svante  Arrhenius  constructs  an  energy  balance  model  to  test  his  
    hypothesis  that  the  ice  ages  were  caused  by  a  drop  in  CO2;;
    (Predicts  global  temperature  rise  of  5.7°C  if  we  double  CO2)
    •Stockholm
    Arrhenius,  S.  (1896).  On  the  Influence  of  Carbonic  Acid  in  the  Air  upon  the  Temperature  of  the  Ground.  
    Philosophical  Magazine  and  Journal  of  Science,  41(251)

    View Slide

  6. 6
    Schematic  of  Arrhenius’s  model
    Image  source:  Easterbrook,  S.  M.  (2018)  Computing  the  Climate.  Cambridge  University  Press.  Forthcoming

    View Slide

  7. 7
    First  Numerical  Simulation  of  Weather
    1950s:  John  Von  Neumann  develops  a  killer  app  for  the  first  
    programmable  electronic  computer  ENIAC:  weather  forecasting
    Imagines  uses  in  weather  control,  geo-­engineering,  etc.
    Lynch,  P.  (2008).  The  ENIAC  Forecasts:  A  Recreation.  Bulletin  of  the  American  Meteorological  Society,  1–11

    View Slide

  8. 8
    Lynch,  P.  (2008).  The  ENIAC  Forecasts:  A  Recreation.  Bulletin  of  the  American  Meteorological  Society,  1–11

    View Slide

  9. 9
    Source:  W.  F.  Ruddiman (2001)  Earth's  Climate:  Past  and  Future

    View Slide

  10. 10
    •Image  Source:  IPCC  Fifth  Assessment  Report,  Jan  2014.  Working  Group  1,  Fig  1.14(b)  
    Grid  scale  in  a  high  resolution  model

    View Slide

  11. 11
    Run  on  a  massively  parallel  supercomputer…
    •The  Yellowstone  supercomputer  at  the  NCAR  Wyoming  Supercomputing  Center,  Cheyenne

    View Slide

  12. 12

    View Slide

  13. 13
    Ensemble  Forecasts  for  Irma
    •Image  Source:  https://www.wunderground.com/hurricane/atlantic/2017/tropical-­storm-­irma

    View Slide

  14. 14
    Ensemble  Forecasts  for  a  Warmer  Climate
    •Image  Source:  https://www.gfdl.noaa.gov/global-­warming-­and-­hurricanes/

    View Slide

  15. 15
    A  little  Chaos  Theory
    •See:  Lorenz,  E.  N.  (1993).  Our  Chaotic  Weather.  In  The  essence  of  chaos (pp.  77–110).

    View Slide

  16. 16
    Outline
    1. What  are  climate  models?
    In  which  we  meet  a  19th  Century  Swedish  chemist  and  a  famous  computer  
    scientist,  and  find  out  if  butterflies  cause  hurricanes
    2. What  is  their  purpose?
    In  which  we  perform  several  dangerous  experiments  on  the  life  support  
    systems  of  planet  Earth,  and  live  to  tell  the  tale
    3. What  Software  Engineering  practices  are  used?
    In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”  
    helps  keep  the  snake  oil  salesmen  away…
    4. Are  they  fit  for  purpose?
    In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and  
    encounter  two  remarkably  effective  V&V  practices.

    View Slide

  17. 17
    Fitness  for  purpose?
    ❍ Climate  model  model  purposes  include:
    ● To  explore  the  consequences  of  a  current  theory;;
    ● To  test  a  hypothesis  about  the  observational  system
    ● To  test  a  hypothesis  about  the  calculational  system
    ● To  provide  homogenized  datasets  (e.g.  re-­analysis);;
    ● To  conduct  thought  experiments  about  different  climates;;
    ● To  act  as  a  comparator  when  debugging  another  model;;  
    ● To  provide  inputs  to  assessments  that  inform  policymaking;;
    Calculational
    System
    Theoretical
    System
    Observational
    System
    1)  Study  this…
    2)  To  gain  insights  on  this…
    3)  …to  make  sense  of  this

    View Slide

  18. 18
    …To  permit  inter-­disciplinary  collaboration!

    View Slide

  19. 19
    Coupled  model
    Atmospheric  Dynamics  
    and  Physics
    Ocean  Dynamics
    Sea  Ice
    Land  Surface  
    Processes
    Atmospheric
    Chemistry
    Ocean  
    Biogeochemistry
    Overlapping  Communities

    View Slide

  20. 20
    Understanding What-­if  Experiments
    •E.g.  How  do  volcanoes
    •affect  climate?
    Sources:  (a)  http://www.imk-­ifu.kit.edu/829.php
    (b)  IPCC  Fourth  Assessment  Report,  2007.  Working  Group  1,  Fig  9.5.  

    View Slide

  21. 21
    Can  we  limit  warming  to  <  +2ºC?
    •From:  MR  Allen et  al. Nature 458,  1163-­1166 (2009)

    View Slide

  22. 22
    Image  Source:  http://www.climate-­lab-­book.ac.uk/2014/when-­will-­we-­reach-­2c/
    When  will  we  reach  +2°
    C?

    View Slide

  23. 23
    Can  we  artificially  cool  the  planet?
    From:  Berdahl,  M.,  et  al.  (2014).  Arctic  cryosphere  response  in  the  Geoengineering  Model  Intercomparison Project  
    G3  and  G4  scenarios.  Journal  of  Geophysical  Research:  Atmospheres,  119(3),  1308–1321.
    •Global  average  near-­
    surface  temperature    (°C)
    •Artic  Sea  Ice  Extent  
    (millions  of  km2)

    View Slide

  24. 24
    Outline
    1. What  are  climate  models?
    In  which  we  meet  a  19th  Century  Swedish  chemist  and  a  famous  computer  
    scientist,  and  find  out  if  butterflies  cause  hurricanes
    2. What  is  their  purpose?
    In  which  we  perform  several  dangerous  experiments  on  the  life  support  
    systems  of  planet  Earth,  and  live  to  tell  the  tale
    3. What  Software  Engineering  practices  are  used?
    In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”  
    helps  keep  the  snake  oil  salesmen  away…
    4. Are  they  fit  for  purpose?
    In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and  
    encounter  two  remarkably  effective  V&V  practices.

    View Slide

  25. 25
    UK  Met  Office  Hadley  Centre  
    (UKMO)
    Max-­Planck  Institut  für
    Meteorologie  (MPI-­M)
    National  Center  for  
    Atmospheric  Research  (NCAR)
    Institut  Pierre-­Simon  Laplace
    (IPSL)

    View Slide

  26. 26
    Example:  SW  Evolution  of  UKMet’s  UM
    Easterbrook,  S.  M.,  &  Johns,  T.  C.  (2009).  Engineering  the  Software  for  Understanding  Climate  Change.  
    Computing  in  Science  and  Engineering,  11(6),  65–74.  

    View Slide

  27. 27
    A  complex  software  eco-­system…
    Alexander,  K.,  Easterbrook,  S.  (2015).  The  software  architecture  of  climate  models:  a  graphical  comparison  of  
    CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  

    View Slide

  28. 28
    Example:  NCAR  vs  UK  Met  Office
    Alexander,  K.,  Easterbrook,  S.  (2015).  The  software  architecture  of  climate  models:  a  graphical  comparison  of  
    CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  

    View Slide

  29. 29
    Example:  Ocean  Model  Genealogy
    Image  Source:  http://efdl.as.ntu.edu.tw/research/timcom/

    View Slide

  30. 30
    The  Earth  System  Modeling  Framework
    ❍ Superstructure:  Component  data  structures  and  
    methods  for  coupling  model  components
    ❍ Infrastructure:  Field  data  structures  and  methods  for  
    building  model  components,  and  utilities  for  coupling
    Time
    ESMF Superstructure
    AppDriver
    Component Classes: GridComp, CplComp, State
    Time
    ESMF Infrastructure
    Data Classes: Bundle, Field, Grid, Array
    Utility Classes: Clock, LogErr, DELayout, VM, Config
    Time
    U ser Code

    View Slide

  31. 31
    The  NUOPC  architecture…
    Model:
    l
    Implements  a  specific  physical  domain,  
    e.g.  atmosphere,  ocean,  wave,  ice.
    Mediator:
    l
    Scientific  coupling  code  (flux  calculations,  
    accumulation,  averaging,  etc.)  between  
    (potentially  multiple)  Models.
    Connector:
    l
    Connects  pairs  of  components  in  one  direction,  
    e.g.  Model  to  Model;;  Model  to/from  Mediator
    l
    Executes  simple  transforms  (Regrid,  units).
    Driver:
    l
    Provides  a  harness  for  Models,  Mediators,  and  
    Connectors  (supporting  hierarchies).
    l
    Coordinates  initialize  and  run  sequences.  

    View Slide

  32. 32
    Outline
    1. What  are  climate  models?
    In  which  we  meet  a  19th  Century  Swedish  chemist  and  a  famous  computer  
    scientist,  and  find  out  if  butterflies  cause  hurricanes
    2. What  is  their  purpose?
    In  which  we  perform  several  dangerous  experiments  on  the  life  support  
    systems  of  planet  Earth,  and  live  to  tell  the  tale
    3. What  Software  Engineering  practices  are  used?
    In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”  
    helps  keep  the  snake  oil  salesmen  away…
    4. Are  they  fit  for  purpose?
    In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and  
    encounter  two  remarkably  effective  V&V  practices.

    View Slide

  33. 33
    Defect  density
    Pipitone,  J.,  Easterbrook,  S.  (2012).  Assessing  climate  model  software  quality:  a  defect  density  analysis  of  three  
    models.  Geoscientific  Model  Development,  5(4),  1009–1022.  

    View Slide

  34. 34
    Hypotheses  for  low  defect  rates
    ❍ Domain  Expertise  
    ● Developers  are  users  and  experts
    ❍ Slow,  cautious  development  process
    ❍ Rigorous  Development  Process
    ● Code  changes  as  scientific  experiments,  with  peer  review
    ❍ Narrow  Usage  Profile
    ● And  hence  potential  for  brittleness
    ❍ Intrinsic  Defect  Sensitivity  /  Tolerance
    ● Bugs  are  either  obvious  or  irrelevant
    ❍ Successful  Disregard  (and  hence  higher  technical  debt)
    ● Scientists  tolerate  poor  code  &  workarounds,  if  it  doesn’t  affect  the  science
    Pipitone,  J.,  Easterbrook,  S.  (2012).  Assessing  climate  model  software  quality:  a  defect  density  analysis  of  three  
    models.  Geoscientific  Model  Development,  5(4),  1009–1022.  

    View Slide

  35. 35
    Few  Defects  Post-­release
    ❍ Obvious  errors  (eliminated  during  pre-­release  testing):
    ● Model  won’t  compile  /  won’t  run
    ● Model  crashes  during  a  run
    ● Model  runs,  but  variables  drift  out  of  tolerance
    ● Runs  don’t  bit-­compare  (when  they  should)
    ❍ Subtle  errors  (model  runs  appear  “valid”):
    ● Model  does  not  simulate  the  intended  physical  processes
    (e.g.  incorrect  model  configuration)
    ● The  right  results  for  the  “wrong  reasons”
    (e.g.  over-­tuning)
    ❍ “Acceptable  Imperfections”
    ● All  models  are  wrong!
    ● Processes  omitted  due  to  computational  constraints
    ● Known  errors  tolerated  because  the  effect  is  “close  enough!”

    View Slide

  36. 36
    •?
    •Model
    •Weakness
    •Develop
    •Hypothesis
    •Run
    •Experiment
    •Interpret
    •Results
    •Peer  
    Review
    •Try  another  hypothesis
    •OK?
    •New  Model  
    Version
    Experiment-­Driven  Development  (EDD)

    View Slide

  37. 37
    Tools  for  EDD
    Image  Source:  UK  Met  Office

    View Slide

  38. 38
    CMIP
    (1996  on)
    CMIP2
    (1997  on)
    CMIP3
    (2005-­2006)
    CMIP5
    (2010-­2011)
    Number  of  
    Experiments
    1 2 12 110
    Centres
    Participating
    16 18 15 24
    #  of  Distinct  
    Models
    19 24 21 45
    #  of  Runs
    (≈  Models  x  Expts)
    19 48 211 841
    Total  Dataset  Size 1  
    Gigabyte
    500  
    Gigabyte
    36  
    Terabyte
    3.3  
    Petabyte
    Total  Downloads  
    from  archive
    ? ? 1.2  Petabyte (still  growing)
    Number  of  Papers  
    Published
    47 595 TBD
    Users 6700 TBD
    The  Coupled  Model  Intercomparison  Projects
    See:  http://www.easterbrook.ca/steve/2012/04/some-­cmip5-­statistics/

    View Slide

  39. 39
    MIPs  as  Software  Benchmarking
    ❍ Susan  Sim’s  theory  of  Software  Benchmarking:
    ● A  benchmark  “defines”  the  research  paradigm  (in  the  Kuhnian  sense)
    ● Benchmarks  (can)  cause  rapid  scientific  progress
    ● Benefits  are  both  sociological  and  technological
    ❍ A  software  benchmark  comprises:
    ● A  Motivating  Comparison
    ● A  Task  Sample
    ● Performance  Measures  (not  necessarily  quantitative)
    ❍ Critical  Success  Factors:
    ● Collaborative  development  of  the  benchmark
    ● Open,  transparent  &  critical  evaluation  of  tools  against  the  benchmark
    ● Retirement  of  old  benchmarks  to  prevent  over-­fitting
    Sim,  S.  E.,  Easterbrook,  S.  M.,  &  Holt,  R.  C.  (2003).  Using  benchmarking  to  advance  research:  a  challenge  to  
    software  engineering.  In  25th  IEEE  Int.  Conf.  on  Software  Engineering  (ICSE’03)

    View Slide

  40. 40
    CMIP  Model  improvement
    Image  Source:  Reichler,  T.,  &  Kim,  J.  (2008).  How  Well  Do  Coupled  Models  Simulate  Today’s  Climate?  Bulletin  of  
    the  American  Meteorological  Society,  89(3),  303–311.
    For  more  MIPs  see:  http://www.clivar.org/clivar-­panels/former-­panels/aamp/resources/mips

    View Slide

  41. 41
    A  Climate
    Model
    Configuration
    ?
    Scientific
    Question
    Model
    Development,
    Selection  &
    Configuration
    Running
    Model
    Interpretation
    of  results
    Papers  &
    Reports
    Scope  of  typical
    model  evaluations
    Scope  of  fitness-­for-­purpose
    validation  of  a  modeling  system
    Is  this  model  configuration  
    appropriate  to  the  
    question?
    Are  the  model  outputs
    used  appropriately?
    From  models  to  modeling  systems

    View Slide

  42. 42
    Key  Success  Factors
    ❍ Software  developers  =  domain  experts  =  users
    ● Bottom  up  decision-­making;;  experts  control  technical  direction
    ● Shared  ownership  and  commitment  to  quality
    ❍ Openness  (code  &  data  freely  available*)
    ❍ Core  set  of  effective  SE  tools
    ● Version  control;;  bug  tracking;;  automated  testing;;  continuous  integration
    ❍ Experiment-­Driven  Development
    ● Hypothesis  testing,  peer  review,  etc.
    ❍ Model  Intercomparisons  &  ensembles
    ● …a  form  of  software  benchmarking

    View Slide

  43. 43
    •Image:  https://www.flickr.com/photos/good_day/211972522/
    Questions  
    from  the  Audience

    View Slide