Slide 1

Slide 1 text

Hack Weeks As A Model for Data Science Education and Collaboration Daniela Huppenkothen, UW Astronomy ! dhuppenkothen " Tiana_Athriel

Slide 2

Slide 2 text

http://www.pnas.org/content/early/2018/08/17/1717196115 + the other hack week organizers

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

+ the ethnographers at all three institutes (esp. Brittany Fiore-Gartland, Laura Norén, Stuart Geiger)

Slide 5

Slide 5 text

Part 1:

Slide 6

Slide 6 text

Part 1: ☕

Slide 7

Slide 7 text

Niels-Bohr Institute, Copenhagen (1929) credit: Niels Bohr Archive

Slide 8

Slide 8 text

American Astronomical Society (2018) credit: AAS/CorporateEventImages/Phil McCarten

Slide 9

Slide 9 text

“The best thing about this meeting is the coffee breaks!”

Slide 10

Slide 10 text

“The best thing about this meeting is the coffee breaks!” • exchange ideas • collaboration • networking

Slide 11

Slide 11 text

Can we organize a workshop that is all coffee breaks?

Slide 12

Slide 12 text

Part 2: Language and Data Science

Slide 13

Slide 13 text

Astronomy vs. the World light curve?

Slide 14

Slide 14 text

Astronomy vs. the World light curve?

Slide 15

Slide 15 text

Fields are siloed

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

• How do we improve the exchange of knowledge?

Slide 18

Slide 18 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel?

Slide 19

Slide 19 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel? • How do we teach data science to domain scientists?

Slide 20

Slide 20 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel? • How do we teach data science to domain scientists? • How do we facilitate collaborations within and across fields?

Slide 21

Slide 21 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel? • How do we teach data science to domain scientists? • How do we facilitate collaborations within and across fields? • How do we facilitate exchange of ideas, knowledge and people between academia and industry?

Slide 22

Slide 22 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel? • How do we teach data science to domain scientists? • How do we facilitate collaborations within and across fields? • How do we facilitate exchange of ideas, knowledge and people between academia and industry? • How do we enable researchers to brainstorm and prototype practical solutions to their problems on a short timescale?

Slide 23

Slide 23 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel? • How do we teach data science to domain scientists? • How do we facilitate collaborations within and across fields? • How do we facilitate exchange of ideas, knowledge and people between academia and industry? • How do we enable researchers to brainstorm and prototype practical solutions to their problems on a short timescale? • How do we enable participation of a diverse range of researchers and make data science welcoming and inclusive?

Slide 24

Slide 24 text

• How do we improve the exchange of knowledge? • How do we remove barriers and stop fields reinventing the wheel? • How do we teach data science to domain scientists? • How do we facilitate collaborations within and across fields? • How do we facilitate exchange of ideas, knowledge and people between academia and industry? • How do we enable researchers to brainstorm and prototype practical solutions to their problems on a short timescale? • How do we enable participation of a diverse range of researchers and make data science welcoming and inclusive? }

Slide 25

Slide 25 text

How do we teach researchers data science?

Slide 26

Slide 26 text

summer school?

Slide 27

Slide 27 text

How do we enable (cross-disciplinary) collaborations and networking?

Slide 28

Slide 28 text

credit: AAS/CorporateEventImages/Phil McCarten conferences?

Slide 29

Slide 29 text

How do we come up with new, innovative solutions to data analysis problems?

Slide 30

Slide 30 text

hackathons? credit: Alex Alspaugh/University of Washington

Slide 31

Slide 31 text

How do we make spaces of academic discourse inclusive and welcoming?

Slide 32

Slide 32 text

Can we combine all of these in a single event?

Slide 33

Slide 33 text

Can we combine all of these in a single event? (and have lots of ☕ )

Slide 34

Slide 34 text

Part 3: Hack Weeks

Slide 35

Slide 35 text

http://astrohackweek.org

Slide 36

Slide 36 text

http://astrohackweek.org Jake VanderPlas

Slide 37

Slide 37 text

What is a hack week?

Slide 38

Slide 38 text

#AstroHackWeek

Slide 39

Slide 39 text

#AstroHackWeek • 5-day workshop

Slide 40

Slide 40 text

#AstroHackWeek • 5-day workshop • ~50 participants

Slide 41

Slide 41 text

#AstroHackWeek • 5-day workshop • ~50 participants • tutorials and break-out sessions

Slide 42

Slide 42 text

#AstroHackWeek • 5-day workshop • ~50 participants • tutorials and break-out sessions • project work

Slide 43

Slide 43 text

#AstroHackWeek • 5-day workshop • ~50 participants • tutorials and break-out sessions • project work • Lots of ☕ and

Slide 44

Slide 44 text

#AstroHackWeek • 5-day workshop • ~50 participants • tutorials and break-out sessions • project work • Lots of ☕ and • participant-driven

Slide 45

Slide 45 text

#AstroHackWeek • 5-day workshop • ~50 participants • tutorials and break-out sessions • project work • Lots of ☕ and • participant-driven • experimental

Slide 46

Slide 46 text

credit: Anthony Arendt

Slide 47

Slide 47 text

http://www.pnas.org/content/early/2018/08/17/1717196115

Slide 48

Slide 48 text

http://www.pnas.org/content/early/2018/08/17/1717196115

Slide 49

Slide 49 text

A toolbox for organizing interactive, collaborative events

Slide 50

Slide 50 text

https://geohackweek.github.io https://neurohackweek.github.io https://oceanhackweek.github.io https://waterhackweek.github.io https://www.electrochem.org/ 233/hack-week

Slide 51

Slide 51 text

Part 4: Participant-driven workshops

Slide 52

Slide 52 text

diversity is excellence

Slide 53

Slide 53 text

“When you decline to create or curate a culture in your spaces, you’re responsible for what spawns in the vacuum.” — Leigh Alexander

Slide 54

Slide 54 text

https://medium.com/@dataethnography/hacked-ethnographic-fieldnotes-4e59bc95f4e5

Slide 55

Slide 55 text

participant-driven != unstructured

Slide 56

Slide 56 text

participant-driven != unstructured • design with the most vulnerable participants in mind • facilitate carefully • mitigate Impostor Phenomenon

Slide 57

Slide 57 text

Part 5: Learning At Hack Weeks

Slide 58

Slide 58 text

“At a summer school, the young learn from the old. At a hack week, the old learn from the young.” — David W. Hogg

Slide 59

Slide 59 text

“At a summer school, the young learn from the old. At a hack week, the old learn from the young.” — David W. Hogg “At a summer school, the young learn from the old. At a hack week, the everyone learns from everyone else.” — Daniela Huppenkothen

Slide 60

Slide 60 text

Tutorials • practically oriented • interactive • make use of participants’ expertise credit: Alex Alspaugh/University of Washington

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

Learning through hacking credit: Alex Alspaugh/University of Washington

Slide 63

Slide 63 text

Part 6: Collaboration

Slide 64

Slide 64 text

hack (n): A hack is a small project with a very clear goal, which should be completed by the end of the time initially allocated to it

Slide 65

Slide 65 text

Astro Hack Week 2018 Wrap Up Slides August 6-10, 2018 Pearse Murphy, Trinity College Dublin, Ireland Challenge: Compute 512 FFTs on ~2 million points without killing my computer What did we achieve: Found a python wrapper for the FFTW library and implemented it. Unfortunately there was no significant speed up. Concrete outcome: A “lazy” solution is to do 512/N FFTs on N computers at the same time and collect data at the end Thoughts: I might join a different hack - feature recognition with machine learning type of thing. Andrei Igoshev, Technion, Israel Deriving the posterior for a variable which depends on many measured values but not measured itself. Posterior is derived. Yusra AlSayyad (Princeton University/LSST) Yusra AlSayyad (Princeton University/LSST) Goal: Explore HSC backgrounds Produced some interesting eigen-backgrounds for the Y-band. Thanks to Rodrigo, Matthew, Nicolas for brainstorming with me FUN with GOOGLE CLOUD PLATFORM Python APP Engine https://mrlbtestofpython.appspot.com Efşan Sökmen - Iain Murray Thanks to : Eleni Petrakou and Andrei Igoshev PCA on Stellar Populations in the Southern Plane - VVV survey …. Using Gaussian Bandpass to Filter Data Sean Morrison Laboratoire d’Astrophysique de Marseille Improved parameter estimation: planet spectra and RT models Statia Cook, Columbia Univ./AMNH Help from: Iain Murray (!!), Lauren Anderson, Becky Steele, Brigitta Sipocz, Daniela Huppenkothen ● runs with emcee are slow, don’t always converge well, not sure if method good for my 10-15 model parameters ● Initial idea: test out something other than emcee ● Actual “hack”: try optimizing first, work with simplified data and model (for speed) Results: works better! I learned that using optimizer first is a good idea :-) How to make the most out of AstroHack Week Pearse Murphy Recurrent Neural Net (GRU) and 1D CNN for early transient light curve classification Daniel Muthukrishna Mohammadjavad Vakili with lots of helpful practical advice from Cole Clifford Inferring the central galaxy stellar mass-halo mass relation with Neural nets: Right: Regression with simple tensorflow implementation of FC NN Left: Inferring P(Mstar | Mhalo) with mixture density network Deep Time Series Alexandar, Brigitta, Ellianna, Gilles, Nicolas, Pearse, Rodrigo, Rohan, Ruth, Tarun Goal There is a lot of information about the mass, age and rotation period of a star in its light curve but our physical models and the tools we use to extract this information are flawed. We postulate that we can do better with RNNs. Learnings - Data pre-processing is hard. - RNNs are cool. - RNNs are expensive - try other approaches first! Link to learnings document: https://tinyurl.com/yaxuw98z Next Steps 1. Run this architecture on 16K Kepler Red Giants star data 2. Apply a Generative Adversarial Network? 3. First few key features to investigate from Kepler data: Mass, age, rotation period Batch Norm Layer LSTM Layer LSTM Layer Fully Connected state 2 (t-1) state 1 (t-1) Input (t) value error time Param est (t) Param est (t-1) Lauren Anderson, Adrian Price-Whelan, Dan Foreman-Mackey, Iain Murray Gradients of likelihood model to use HMC samplers, or various optimization stuff General Optimizer: Success after ~1000 function calls Optimizer with gradients: Fails after ~100 function calls Toy problem: Simple Harmonic Oscillator Initial guess for optimizer Cardboard Universe: tinyurl.com/3dexoplanets Team: Matt, Ellie, David, Efsan, Stephanie, Brigitta, Yanett, Becky Challenge: Zoom through stars and their exoplanets using Google Cardboard + Three.js Achieved: In-browser prototype ready (randomized systems only) https://github.com/beckysteele/cardboard_universe Next steps: Connect Exoplanet Archive data to 3D simulation, input a 360 deg view with a Milky Way background, and make it Google Cardboard-able Jeroen Bédorf - Leiden University/Observatory Google APIs, challenges involved: Finding creditcard details, access permissions, service user roles, including credentials in the API request, accessing the results, enabling the correct APIs, installing the correct Python packages. https://github.com/jbedorf/astrohackweek_sentiment_tool Rohan Pattnaik Personal Hack Objectives: ● Compile a list of approaches to classify spectra from other instruments ● Get started with open source development ArXiv.ninja Dan F-M // Adrian P-W github.com/dfm/arxiv.ninja BIG DATA METHODS FOR EXTRACTING RELATIONS BETWEEN THE TIMING OF SOLAR FLARES AND PLANETARY POSITIONS Indications exist for a relation between them. Goals of the week: Take a solid step in classification + Kickstart associative rules mining. ONE SOLID STEP IN CLASSIFICATION: Random forest; each of cycles 21-24 behaves differently; find one way to improve “universal training”. At ro H ek Bef A t o H k e = At least one package running on my file without crashing. KICKSTART RULES: WEKA 3.8 running A priori algorithm Eleni Petrakou AstroCapital •Goal: Create a web-page which educates and allows astronomers to communicate about tech-enabled all-things-astro. •Why: • Lack of such a platform • Efficient, new ways to go from data to astronomy/science • Open platform for everyone • share, contribute and stay updated! • Central astro-tech resource hub • Connecting ex-/non-astronomers to the astro community ••Status: Survey and Web ••What next: • Let us know if you can contribute to any sections (e.g. writing blogs) AmrutaJaodand Daisy Mak LilianneNakazono Zach Akil NorhaslizaYusof Becky Steel Mohammadjavad Vakil makecite —> check_cite Leon Trapman + Adrian Price-Whelan/ Alexandar Mechev / Julia Melo Rodrigues de Aguiar / Brigitta Sipőcz + First pull request :) Riccardo Buscicchio, University of Birmingham, UK Challenge: Try not be scared by numerical integration, i.e. evaluate What did we achieve: recursive, almost-brute-force approach (thanks, Brigitta!) (soon-to-be) Concrete outcome: Thoughts: Any clever implementation is welcome. Btw, non-gaussianities are fun! ASTRO HACK WEEK LOCAL/REGIONAL EDITION Lilianne, Stephanie, David Goals: ● To further extend reach to people who want to learn about astrohack (tools, etc) and its topics but couldn’t afford to come to international venues, have fewer resources or were not accepted to the workshop. ● To lessen language barriers. For example, not everyone could speak English so if in a regional/local setting, if everyone speaks Portuguese then easier to teach or implement the workshop. ● To encourage people to learn new topics beyond their choice of study and engage them to use these topics for their perusal, expand skills and learning. ● To accomplish good activities for the astronomy society in the local country and in the general public as a whole. MORE IDEAS? SUGGESTIONS? Link: https://docs.google.com/document/d/1xRjE6CGYTSHQ6K2jEnprLmj3u9fMVUUncIxX-pxdAPU/edit?usp=s haring Motivator - Chrome Extension Including great historical quotes from Ru Paul, Beyonce, your grandma, etc. Boris L Nicolas A Astro Grad Admissions Optimization: questionnaire and output Camila, Malavika, Pearce, Riccardo, Rodrigo, Sean, Statia, Tarun Based on your priorities, the following assessment tools are recommended for admissions to your program: … Questions to include in letters of reference: “Briefly (in 5-6 sentences) describe a time that the candidate demonstrated initiative. This could include reaching out to potential mentors or collaborators, learning independently, or taking on tasks on their own.” Evaluation Criteria Super Application Stage Interview Stage Offer Stage Physics Preparation 35 35 35 Computational Skills 35 35 35 Character Values 30 30 30 Link to questionnaire: https://tinyurl.com/yaekl7v4 | Link to document: https://tinyurl.com/yb9vcb9o Eleni & Peer review and the blockchain Alexandar, Daniel, Yusra A possible implementation of the peer-review system (as it is today) without journals, with blockchain. [More will be written in the doc...] https://docs.google.com/document/d/1fwMtRsYj2A-NHY3pgZJ38DHZvwhtkellXZq2UiMOoYQ Sentiment analysis via Google NLP API - Jeroen Steps: - Use Github API to pull in some comments - Created a Google Cloud Project, enabled NLP API - Created credentials - Use Google NLP API to parse the text Some results of PR: https://github.com/astropy/astropy/pull/7712 I believe the grouping should work for `Time` mixins, too now that sorting is working? Score: 0.6 Magnitude: 1.2 so if this should fail, could you add another example where shorting is failing for these columns? Score: -0.7 Magnitude: 0.7 note that at this point `keys` had to be an `ndarray`, so all the code below dealing with a pre-made index was never being run. Score: -0.1 Magnitude: 0.3 Score: -1 negative, 1.0 positive. Magnitude: How strong a reaction is AHW 2019 and 2020 Venues Lauren and Ellie Finding venues for unique conferences is challenging, conference venues are expensive Keep AHW affordable, look for venues that include some budget participants, small/non-existent conference costs Venues are already booked for 2019 and looking applications for 2020 Flatiron Institute, Banff International Research Station for Mathematical Innovation and Discovery, Casa Matematica Oaxaca, Ringberg Other ideas/suggestions ?? Add them to this document please !! ScienceTheatre hack week ● Discussed motivation, goals and objectives ● Structure ● Venue ● Funding ● Program ● Expectations and outcomes Document here: https://docs.google.com/document/d/1An1SW8h6SRIwmbiItnMsSG0MgBMMFGzwn_s46-oUElo/edit Ruth, Daniella, Pearse, Marie citebot RA & DFM https://github.com/ruthangus/citebot Adrian Price-Whelan Succeeded in getting Brigitta to attempt sentiment analysis on GitHub issue and pull request comments (but see previous slide) Worked on infrastructure and in progress overhaul of Astropy tutorials site Brigitta Sipőcz Fail: run into API limits after the first 437th comment, given up. Made sure __citation__ and __bibtex__ works. It does now. TODO: make sure makecite uses __citation__/__bibtex__ when available Sentiment analysis of GitHub issue/PR comments Survey for tech/astro data preference Amruta Jaodand Daisy Mak Lilianne Nakazono NorhaslizaYusof And YOU ! We will launch our web tomorrow ! Tutorials for formulating problems in a Bayesian way Leon Trapman, Mohammadjavad Vakili, Iain Murray, Andrei Igoshev, Daniel Mortlock (community hack; 2018-08-09; IBM & Astro Hack Week) 1. Inferring distance to a star from a parallax measurement [A.I.; DONE] 2. Inferring cosmological parameters from power spectrum (with emuation) [M.V.] 3. Inferring luminosity of a star from parallax and flux measurements [A.I.; EXTENSION OF 1.] 4. Inferring the Solar System potential from a snapshot of planets kinematics [I.M.; PUBLISHED] 5. Inferring the mass of the Galactic halo from Magellanic clouds [I.M.; PUBLISHED] 6. Inferring the age of neutron stars from Galactic position, parallax and proper motion [A.I.] 7. Inferring dust content of a protoplanetary disk from an ALMA image [L.T.; SORT-OF-DONE] 8. Inferring whether an asteroid will hit the Earth [I.M., D.M.] 9. Inferring the properties of a merger from gravitational wave observations [A.I.] 10. Inferring which card is showing of white-white, white-black, black-black [I.M., D.M.] 11. Inferring the number density of galaxies from a survey [D.M.] AHW 2018 Survey (Daniela Huppenkothen + Antonia Rowlinson) … is ready for you! (Link + password tomorrow morning!)

Slide 66

Slide 66 text

Encourage Open Science

Slide 67

Slide 67 text

Part 7: Does it work?

Slide 68

Slide 68 text

• track long-term outcomes (papers, software, …) • evaluation via post-attendance surveys • ethnographic work • case studies • team photos • regular discussions across hack weeks

Slide 69

Slide 69 text

Encourage Open Science

Slide 70

Slide 70 text

Encourage Open Science

Slide 71

Slide 71 text

+ 3 maintainers + ~8 regular contributors + 6 Google Summer of Code Projects + 2 derivative OSS projects

Slide 72

Slide 72 text

Survey Results 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 s from the 2016 astro-, geo- and neuro- hack weeks. Response rates are in the panel titles. Results presented in three different domains: c), collaboration and teaching (d – f), and shifts in attitudes towards reproducibility and open science (g, h). do minority participants di er sig- ith respect to teaching outcomes, ns or the value of their contribu- > 0.0007). For GHW, there is an we proposed about the use and outcomes of hack weeks. The number of respondents is small and the e ects likely subtle, and lack of significant di erences may be due to statistical power in our sample. Furthermore, the most important independent 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 Fig. 2. Post-workshop survey responses from the 2016 astro-, geo- and neuro- hack weeks. Response rates are in the panel titles. Results presented in three different domains: the development of technical skills (a – c), collaboration and teaching (d – f), and shifts in attitudes towards reproducibility and open science (g, h). for none of the hack weeks do minority participants di er sig- nificantly in their answers with respect to teaching outcomes, building valuable connections or the value of their contribu- tions to their hack teams (p > 0.0007). For GHW, there is an we proposed about the use and outcomes of hack weeks. The number of respondents is small and the e ects likely subtle, and lack of significant di erences may be due to statistical power in our sample. Furthermore, the most important independent 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 Fig. 2. Post-workshop survey responses from the 2016 astro-, geo- and neuro- hack weeks. Resp the development of technical skills (a – c), collaboration and teaching (d – f), and shifts in attitud for none of the hack weeks do minority participants di er sig- nificantly in their answers with respect to teaching outcomes, building valuable connections or the value of their contribu- tions to their hack teams (p > 0.0007). For GHW, there is an we pr numb lack in ou 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 Fig. 2. Post-workshop survey responses from the 2016 astro-, geo- and neuro- hack weeks. Response rates are in the panel titles. Results presented in three different domains: the development of technical skills (a – c), collaboration and teaching (d – f), and shifts in attitudes towards reproducibility and open science (g, h). for none of the hack weeks do minority participants di er sig- nificantly in their answers with respect to teaching outcomes, building valuable connections or the value of their contribu- tions to their hack teams (p > 0.0007). For GHW, there is an we proposed about the use and outcomes of hack weeks. The number of respondents is small and the e ects likely subtle, and lack of significant di erences may be due to statistical power in our sample. Furthermore, the most important independent 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Slide 73

Slide 73 text

Take-Away Lessons

Slide 74

Slide 74 text

build a community first credit: eScience Institute

Slide 75

Slide 75 text

build a culture that empowers people to ask fundamental (and trivial) questions credit: Alex Alspaugh/University of Washington

Slide 76

Slide 76 text

give participants structure, but freedom within that structure

Slide 77

Slide 77 text

Adapt concepts and ideas to your community’s needs

Slide 78

Slide 78 text

Experiment

Slide 79

Slide 79 text

Evaluate credit: eScience Institute

Slide 80

Slide 80 text

Share experiences

Slide 81

Slide 81 text

http://www.pnas.org/content/early/2018/08/17/1717196115 ! dhuppenkothen " Tiana_Athriel # [email protected] + extensive supplementary materials + living checklist: https://docs.google.com/document/d/ 15cgFL4foZy3jFN9E_y_tkT_XKnbDiw75r5EoYD9CcDA/edit? usp=sharing

Slide 82

Slide 82 text

http://www.pnas.org/content/early/2018/08/17/1717196115 ! dhuppenkothen " Tiana_Athriel # [email protected] + extensive supplementary materials + living checklist: https://docs.google.com/document/d/ 15cgFL4foZy3jFN9E_y_tkT_XKnbDiw75r5EoYD9CcDA/edit? usp=sharing Come and chat with us!

Slide 83

Slide 83 text

No content