Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Verifying the Forecast: How Climate Models are Developed and Tested

Verifying the Forecast: How Climate Models are Developed and Tested

Keynote Talk given at the ESEC/FSE-2017, the 11th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.
Paderborn, Germany, Sept 7, 2017

Steve Easterbrook

September 07, 2017
Tweet

More Decks by Steve Easterbrook

Other Decks in Research

Transcript

  1. Verifying  the  Forecast:   How  Climate  Models  are  Developed  and

     Tested Steve  Easterbrook Email: [email protected] Blog: www.easterbrook.ca/steve Twitter: @SMEasterbrook
  2. 2 A  complex  software  eco-­system… Alexander,  K.,  Easterbrook,  S.  (2015).

     The  software  architecture  of  climate  models:  a  graphical  comparison  of   CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  
  3. 3 Example:  NCAR,  Boulder Alexander,  K.,  Easterbrook,  S.  (2015).  The

     software  architecture  of  climate  models:  a  graphical  comparison  of   CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  
  4. 4 Outline 1. What  are  climate  models? In  which  we

     meet  a  19th  Century  Swedish  chemist  and  a  famous  computer   scientist,  and  find  out  if  butterflies  cause  hurricanes 2. What  is  their  purpose? In  which  we  perform  several  dangerous  experiments  on  the  life  support   systems  of  planet  Earth,  and  live  to  tell  the  tale 3. What  Software  Engineering  practices  are  used? In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”   helps  keep  the  snake  oil  salesmen  away… 4. Are  they  fit  for  purpose? In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and   encounter  two  remarkably  effective  V&V  practices.
  5. 5 The  First  Computational  Climate  Model 1895:  Svante  Arrhenius  constructs

     an  energy  balance  model  to  test  his   hypothesis  that  the  ice  ages  were  caused  by  a  drop  in  CO2;; (Predicts  global  temperature  rise  of  5.7°C  if  we  double  CO2) •Stockholm Arrhenius,  S.  (1896).  On  the  Influence  of  Carbonic  Acid  in  the  Air  upon  the  Temperature  of  the  Ground.   Philosophical  Magazine  and  Journal  of  Science,  41(251)
  6. 6 Schematic  of  Arrhenius’s  model Image  source:  Easterbrook,  S.  M.

     (2018)  Computing  the  Climate.  Cambridge  University  Press.  Forthcoming
  7. 7 First  Numerical  Simulation  of  Weather 1950s:  John  Von  Neumann

     develops  a  killer  app  for  the  first   programmable  electronic  computer  ENIAC:  weather  forecasting Imagines  uses  in  weather  control,  geo-­engineering,  etc. Lynch,  P.  (2008).  The  ENIAC  Forecasts:  A  Recreation.  Bulletin  of  the  American  Meteorological  Society,  1–11
  8. 8 Lynch,  P.  (2008).  The  ENIAC  Forecasts:  A  Recreation.  Bulletin

     of  the  American  Meteorological  Society,  1–11
  9. 10 •Image  Source:  IPCC  Fifth  Assessment  Report,  Jan  2014.  Working

     Group  1,  Fig  1.14(b)   Grid  scale  in  a  high  resolution  model
  10. 11 Run  on  a  massively  parallel  supercomputer… •The  Yellowstone  supercomputer

     at  the  NCAR  Wyoming  Supercomputing  Center,  Cheyenne
  11. 12

  12. 15 A  little  Chaos  Theory •See:  Lorenz,  E.  N.  (1993).

     Our  Chaotic  Weather.  In  The  essence  of  chaos (pp.  77–110).
  13. 16 Outline 1. What  are  climate  models? In  which  we

     meet  a  19th  Century  Swedish  chemist  and  a  famous  computer   scientist,  and  find  out  if  butterflies  cause  hurricanes 2. What  is  their  purpose? In  which  we  perform  several  dangerous  experiments  on  the  life  support   systems  of  planet  Earth,  and  live  to  tell  the  tale 3. What  Software  Engineering  practices  are  used? In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”   helps  keep  the  snake  oil  salesmen  away… 4. Are  they  fit  for  purpose? In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and   encounter  two  remarkably  effective  V&V  practices.
  14. 17 Fitness  for  purpose? ❍ Climate  model  model  purposes  include:

    • To  explore  the  consequences  of  a  current  theory;; • To  test  a  hypothesis  about  the  observational  system • To  test  a  hypothesis  about  the  calculational  system • To  provide  homogenized  datasets  (e.g.  re-­analysis);; • To  conduct  thought  experiments  about  different  climates;; • To  act  as  a  comparator  when  debugging  another  model;;   • To  provide  inputs  to  assessments  that  inform  policymaking;; Calculational System Theoretical System Observational System 1)  Study  this… 2)  To  gain  insights  on  this… 3)  …to  make  sense  of  this
  15. 19 Coupled  model Atmospheric  Dynamics   and  Physics Ocean  Dynamics

    Sea  Ice Land  Surface   Processes Atmospheric Chemistry Ocean   Biogeochemistry Overlapping  Communities
  16. 20 Understanding What-­if  Experiments •E.g.  How  do  volcanoes •affect  climate?

    Sources:  (a)  http://www.imk-­ifu.kit.edu/829.php (b)  IPCC  Fourth  Assessment  Report,  2007.  Working  Group  1,  Fig  9.5.  
  17. 21 Can  we  limit  warming  to  <  +2ºC? •From:  MR

     Allen et  al. Nature 458,  1163-­1166 (2009)
  18. 23 Can  we  artificially  cool  the  planet? From:  Berdahl,  M.,

     et  al.  (2014).  Arctic  cryosphere  response  in  the  Geoengineering  Model  Intercomparison Project   G3  and  G4  scenarios.  Journal  of  Geophysical  Research:  Atmospheres,  119(3),  1308–1321. •Global  average  near-­ surface  temperature    (°C) •Artic  Sea  Ice  Extent   (millions  of  km2)
  19. 24 Outline 1. What  are  climate  models? In  which  we

     meet  a  19th  Century  Swedish  chemist  and  a  famous  computer   scientist,  and  find  out  if  butterflies  cause  hurricanes 2. What  is  their  purpose? In  which  we  perform  several  dangerous  experiments  on  the  life  support   systems  of  planet  Earth,  and  live  to  tell  the  tale 3. What  Software  Engineering  practices  are  used? In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”   helps  keep  the  snake  oil  salesmen  away… 4. Are  they  fit  for  purpose? In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and   encounter  two  remarkably  effective  V&V  practices.
  20. 25 UK  Met  Office  Hadley  Centre   (UKMO) Max-­Planck  Institut

     für Meteorologie  (MPI-­M) National  Center  for   Atmospheric  Research  (NCAR) Institut  Pierre-­Simon  Laplace (IPSL)
  21. 26 Example:  SW  Evolution  of  UKMet’s  UM Easterbrook,  S.  M.,

     &  Johns,  T.  C.  (2009).  Engineering  the  Software  for  Understanding  Climate  Change.   Computing  in  Science  and  Engineering,  11(6),  65–74.  
  22. 27 A  complex  software  eco-­system… Alexander,  K.,  Easterbrook,  S.  (2015).

     The  software  architecture  of  climate  models:  a  graphical  comparison  of   CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  
  23. 28 Example:  NCAR  vs  UK  Met  Office Alexander,  K.,  Easterbrook,

     S.  (2015).  The  software  architecture  of  climate  models:  a  graphical  comparison  of   CMIP5  and  EMICAR5  configurations.  Geoscientific Model  Development 8,  1221-­1232.  
  24. 30 The  Earth  System  Modeling  Framework ❍ Superstructure:  Component  data

     structures  and   methods  for  coupling  model  components ❍ Infrastructure:  Field  data  structures  and  methods  for   building  model  components,  and  utilities  for  coupling Time ESMF Superstructure AppDriver Component Classes: GridComp, CplComp, State Time ESMF Infrastructure Data Classes: Bundle, Field, Grid, Array Utility Classes: Clock, LogErr, DELayout, VM, Config Time U ser Code
  25. 31 The  NUOPC  architecture… Model: l Implements  a  specific  physical

     domain,   e.g.  atmosphere,  ocean,  wave,  ice. Mediator: l Scientific  coupling  code  (flux  calculations,   accumulation,  averaging,  etc.)  between   (potentially  multiple)  Models. Connector: l Connects  pairs  of  components  in  one  direction,   e.g.  Model  to  Model;;  Model  to/from  Mediator l Executes  simple  transforms  (Regrid,  units). Driver: l Provides  a  harness  for  Models,  Mediators,  and   Connectors  (supporting  hierarchies). l Coordinates  initialize  and  run  sequences.  
  26. 32 Outline 1. What  are  climate  models? In  which  we

     meet  a  19th  Century  Swedish  chemist  and  a  famous  computer   scientist,  and  find  out  if  butterflies  cause  hurricanes 2. What  is  their  purpose? In  which  we  perform  several  dangerous  experiments  on  the  life  support   systems  of  planet  Earth,  and  live  to  tell  the  tale 3. What  Software  Engineering  practices  are  used? In  which  we  politely  suggest  that  the  question  “does  it  work  with  FORTRAN?”   helps  keep  the  snake  oil  salesmen  away… 4. Are  they  fit  for  purpose? In  which  we  measure  a  very  low  bug  density,  lose  faith  in  software  metrics,  and   encounter  two  remarkably  effective  V&V  practices.
  27. 33 Defect  density Pipitone,  J.,  Easterbrook,  S.  (2012).  Assessing  climate

     model  software  quality:  a  defect  density  analysis  of  three   models.  Geoscientific  Model  Development,  5(4),  1009–1022.  
  28. 34 Hypotheses  for  low  defect  rates ❍ Domain  Expertise  

    • Developers  are  users  and  experts ❍ Slow,  cautious  development  process ❍ Rigorous  Development  Process • Code  changes  as  scientific  experiments,  with  peer  review ❍ Narrow  Usage  Profile • And  hence  potential  for  brittleness ❍ Intrinsic  Defect  Sensitivity  /  Tolerance • Bugs  are  either  obvious  or  irrelevant ❍ Successful  Disregard  (and  hence  higher  technical  debt) • Scientists  tolerate  poor  code  &  workarounds,  if  it  doesn’t  affect  the  science Pipitone,  J.,  Easterbrook,  S.  (2012).  Assessing  climate  model  software  quality:  a  defect  density  analysis  of  three   models.  Geoscientific  Model  Development,  5(4),  1009–1022.  
  29. 35 Few  Defects  Post-­release ❍ Obvious  errors  (eliminated  during  pre-­release

     testing): • Model  won’t  compile  /  won’t  run • Model  crashes  during  a  run • Model  runs,  but  variables  drift  out  of  tolerance • Runs  don’t  bit-­compare  (when  they  should) ❍ Subtle  errors  (model  runs  appear  “valid”): • Model  does  not  simulate  the  intended  physical  processes (e.g.  incorrect  model  configuration) • The  right  results  for  the  “wrong  reasons” (e.g.  over-­tuning) ❍ “Acceptable  Imperfections” • All  models  are  wrong! • Processes  omitted  due  to  computational  constraints • Known  errors  tolerated  because  the  effect  is  “close  enough!”
  30. 36 •? •Model •Weakness •Develop •Hypothesis •Run •Experiment •Interpret •Results

    •Peer   Review •Try  another  hypothesis •OK? •New  Model   Version Experiment-­Driven  Development  (EDD)
  31. 38 CMIP (1996  on) CMIP2 (1997  on) CMIP3 (2005-­2006) CMIP5

    (2010-­2011) Number  of   Experiments 1 2 12 110 Centres Participating 16 18 15 24 #  of  Distinct   Models 19 24 21 45 #  of  Runs (≈  Models  x  Expts) 19 48 211 841 Total  Dataset  Size 1   Gigabyte 500   Gigabyte 36   Terabyte 3.3   Petabyte Total  Downloads   from  archive ? ? 1.2  Petabyte (still  growing) Number  of  Papers   Published 47 595 TBD Users 6700 TBD The  Coupled  Model  Intercomparison  Projects See:  http://www.easterbrook.ca/steve/2012/04/some-­cmip5-­statistics/
  32. 39 MIPs  as  Software  Benchmarking ❍ Susan  Sim’s  theory  of

     Software  Benchmarking: • A  benchmark  “defines”  the  research  paradigm  (in  the  Kuhnian  sense) • Benchmarks  (can)  cause  rapid  scientific  progress • Benefits  are  both  sociological  and  technological ❍ A  software  benchmark  comprises: • A  Motivating  Comparison • A  Task  Sample • Performance  Measures  (not  necessarily  quantitative) ❍ Critical  Success  Factors: • Collaborative  development  of  the  benchmark • Open,  transparent  &  critical  evaluation  of  tools  against  the  benchmark • Retirement  of  old  benchmarks  to  prevent  over-­fitting Sim,  S.  E.,  Easterbrook,  S.  M.,  &  Holt,  R.  C.  (2003).  Using  benchmarking  to  advance  research:  a  challenge  to   software  engineering.  In  25th  IEEE  Int.  Conf.  on  Software  Engineering  (ICSE’03)
  33. 40 CMIP  Model  improvement Image  Source:  Reichler,  T.,  &  Kim,

     J.  (2008).  How  Well  Do  Coupled  Models  Simulate  Today’s  Climate?  Bulletin  of   the  American  Meteorological  Society,  89(3),  303–311. For  more  MIPs  see:  http://www.clivar.org/clivar-­panels/former-­panels/aamp/resources/mips
  34. 41 A  Climate Model Configuration ? Scientific Question Model Development,

    Selection  & Configuration Running Model Interpretation of  results Papers  & Reports Scope  of  typical model  evaluations Scope  of  fitness-­for-­purpose validation  of  a  modeling  system Is  this  model  configuration   appropriate  to  the   question? Are  the  model  outputs used  appropriately? From  models  to  modeling  systems
  35. 42 Key  Success  Factors ❍ Software  developers  =  domain  experts

     =  users • Bottom  up  decision-­making;;  experts  control  technical  direction • Shared  ownership  and  commitment  to  quality ❍ Openness  (code  &  data  freely  available*) ❍ Core  set  of  effective  SE  tools • Version  control;;  bug  tracking;;  automated  testing;;  continuous  integration ❍ Experiment-­Driven  Development • Hypothesis  testing,  peer  review,  etc. ❍ Model  Intercomparisons  &  ensembles • …a  form  of  software  benchmarking