Accelerating scientific computing with GPUs

Slide 1

Slide 1 text

Rodrigo Nemmen Universidade de São Paulo Accelerating Scientific Computing with GPUs w/ Roberta D. Pereira, João Paulo Navarro (NVIDIA), Matheus Bernardino, Alfredo Goldmann, Ivan Almeida M. Weiss, CfA blackholegroup.org @nemmen

Slide 30

Slide 30 text

Exponential evolution of astrophysical fluid simulations Year Data from: Ruffert+1994; Hawley+1995; Hawley+1996; Stone+1999; Igumenshchev 2000; Stone+2001; Hawley+2002; De Villiers+2003; McKinney+2004; Kuwabara+2005; Hawley+2006; Beckwith+2009; Tchekhovskoy+2011; Narayan+2012; McKinney+2012; Sorathia+2012; McKinney+2013; Hawley+2013; Kestener+2014; Gheller+2015; Benitez- Llambay+2016; Schneider+2017 Liska+2018 Schneider+2018 log10 # of fluid elements t0 = 2.8 yrs y / et/to AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY=

Slide 31

Slide 31 text

Exponential evolution of astrophysical fluid simulations Kestener+2014 RAMSES-GPU N = 800x1600x800 Rise of GPU codes Year # of fluid elements t0 = 2.8 yrs y / et/to AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= AAACB3icdVDLSgMxFM34rPVVdekmWAQXUjNVa7srunGpYG2hrSWTphrMTEJypzAM/QC/wq2u3IlbP8OF/2I6VlDRA4HDOfdyc06gpbBAyJs3NT0zOzefW8gvLi2vrBbW1i+tig3jDaakMq2AWi5FxBsgQPKWNpyGgeTN4PZk7DeH3FihogtINO+G9DoSA8EoOKlX2EhwRxulQWF+lcIe9NSoVyiSUq1GDvwKJqVDQsqVmiNkv1ytVLBfIhmKaIKzXuG901csDnkETFJr2z7R0E2pAcEkH+U7seWaslt6zduORjTkdrc/FNpmtJtmOUZ425l9PFDGvQhwpn5fTmlobRIGbjKkcGN/e2PxL68dw6DaTUWkY+AR+zw0iCV2qcel4L4wnIFMHKHMCPdtzG6ooQxcdXnXx1do/D+5LJd8x88PivXjSTM5tIm20A7y0RGqo1N0hhqIoQTdowf06N15T96z9/I5OuVNdjbQD3ivHzQembY= Data from: Ruffert+1994; Hawley+1995; Hawley+1996; Stone+1999; Igumenshchev 2000; Stone+2001; Hawley+2002; De Villiers+2003; McKinney+2004; Kuwabara+2005; Hawley+2006; Beckwith+2009; Tchekhovskoy+2011; Narayan+2012; McKinney+2012; Sorathia+2012; McKinney+2013; Hawley+2013; Kestener+2014; Gheller+2015; Benitez- Llambay+2016; Schneider+2017 Liska+2018 Schneider+2018

Slide 80

Slide 80 text

3. Photon propagation Solve geodesic equation in curved spacetime Geodesic equation Radiative transfer 3. GEODESIC INTEGRATION General relativistic radiative transfer differs from conventional radiative transfer in Minkowski space in that photon tra- jectories are no longer trivial; photons move along geodesics. Tracking geodesics is a significant computational expense in grmonty. The governing equations for a photon trajectory are dxα dλ = kα (11) which defines λ, the affine parameter, the geodesic equation dkα dλ = −Γα µν kµkν, (12) and the definition of the connection coefficients k require per ste Equatio with kµ n process estimat iteratio only o evaluat to conv a class How We pro direct, With ε = 0.04, grmonty integrates ∼16,700 geodesics s−1 on a single core of an Intel Xeon model E5430. If we use fourth- order Runge–Kutta exclusively so that the error in E, l, and Q is ∼1000 times smaller, then the speed is ∼ 6200 geodesics s−1. If we use the Runge–Kutta Prince–Dorman method in GSL with ε = 0.04 the fraction error is ∼ 10−10 and the speed is ∼1700 geodesics s−1. These results can be compared to the publicly available integral-based geokerr code of Dexter & Agol (2009), whose geodesics are shown as the (more accurate) solid lines in Figure 1. If we use geokerr to sample each geodesic the same number of times as grmonty (∼180), then on the same machine geokerr runs at ∼1000 geodesics s−1. It is possible that other implementations of an integral-of-motion- based geodesic tracker could be faster. If only the initial and final states of the photon are required, we find that geokerr computes ∼77,000 geodesics s−1. The adaptive Runge–Kutta Cash–Karp integrator in GSL computes ∼34,500 geodesics s−1 with fractional error ∼10−3. 4. ABSORPTION grmonty treats absorption deterministically. We begin with the radiative transfer equation written in the covariant form 1 C d dλ Iν ν3 = jν ν2 − (ναν,a ) Iν ν3 . (15) (see Mihalas & Mihalas 1984). Here Iν is specific intensity and for example, is proportional to Since Iν/ν3 is proportional t along each ray, Iν/ν3 ∝ w, and emission) dw dτa = where dτa = (ν is the differential optical depth parentheses is the “invariant opa with second-order accuracy τa = 1 2 ((ναν,a ) n + and then set wn+1 = Since the components of kµ are rest-mass energy, ν = −kµu opacity at the end of each step be reused as the beginning of t 5. SCAT Our treatment of scattering determines where a superpho

Slide 81

Slide 81 text

3. Photon propagation Solve geodesic equation in curved spacetime Geodesic equation Radiative transfer 3. GEODESIC INTEGRATION General relativistic radiative transfer differs from conventional radiative transfer in Minkowski space in that photon tra- jectories are no longer trivial; photons move along geodesics. Tracking geodesics is a significant computational expense in grmonty. The governing equations for a photon trajectory are dxα dλ = kα (11) which defines λ, the affine parameter, the geodesic equation dkα dλ = −Γα µν kµkν, (12) and the definition of the connection coefficients k require per ste Equatio with kµ n process estimat iteratio only o evaluat to conv a class How We pro direct, With ε = 0.04, grmonty integrates ∼16,700 geodesics s−1 on a single core of an Intel Xeon model E5430. If we use fourth- order Runge–Kutta exclusively so that the error in E, l, and Q is ∼1000 times smaller, then the speed is ∼ 6200 geodesics s−1. If we use the Runge–Kutta Prince–Dorman method in GSL with ε = 0.04 the fraction error is ∼ 10−10 and the speed is ∼1700 geodesics s−1. These results can be compared to the publicly available integral-based geokerr code of Dexter & Agol (2009), whose geodesics are shown as the (more accurate) solid lines in Figure 1. If we use geokerr to sample each geodesic the same number of times as grmonty (∼180), then on the same machine geokerr runs at ∼1000 geodesics s−1. It is possible that other implementations of an integral-of-motion- based geodesic tracker could be faster. If only the initial and final states of the photon are required, we find that geokerr computes ∼77,000 geodesics s−1. The adaptive Runge–Kutta Cash–Karp integrator in GSL computes ∼34,500 geodesics s−1 with fractional error ∼10−3. 4. ABSORPTION grmonty treats absorption deterministically. We begin with the radiative transfer equation written in the covariant form 1 C d dλ Iν ν3 = jν ν2 − (ναν,a ) Iν ν3 . (15) (see Mihalas & Mihalas 1984). Here Iν is specific intensity and for example, is proportional to Since Iν/ν3 is proportional t along each ray, Iν/ν3 ∝ w, and emission) dw dτa = where dτa = (ν is the differential optical depth parentheses is the “invariant opa with second-order accuracy τa = 1 2 ((ναν,a ) n + and then set wn+1 = Since the components of kµ are rest-mass energy, ν = −kµu opacity at the end of each step be reused as the beginning of t 5. SCAT Our treatment of scattering determines where a superpho

Slide 103

Slide 103 text

Garry Kasparov is the former world chess champion and the author of Deep Thinking: Where Machine Intelligence Ends and Human Downloaded from The recent world chess championship saw Mag- nus Carlsen defend his title against Fabiano Caruana. But it was not a contest between the two strongest chess players on the planet, only the strongest humans. Soon after I lost my re- match against IBM’s Deep Blue in 1997, the short window of human-machine chess competition slammed shut forever. Unlike humans, machines keep getting faster, and today a smartphone chess app can be stronger than Deep Blue. But as we see with the Al phaZero system (see pages 1118 and 1140), machine dominance has not ended the historical role of chess as a laboratory of cognition. Much as the Drosophila melanogaster fruit fly became a model organism for geneticists, chess became a Drosophila of reasoning. In the late 19th century, Alfred Binet hoped that understand- ing why certain people ex- celled at chess would unlock secrets of human thought. braries of opening and endgame moves, AlphaZero starts out knowing only the rules of chess, with no embedded human strategies. In just a few hours, it plays more games against itself than have been recorded in human chess history. It teaches itself the best way to play, reeval- uating such fundamental concepts as the relative values of the pieces. It quickly becomes strong enough to defeat the best chess-playing entities in the world, winning 28, drawing 72, and losing none in a victory over Stockfish. I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own. The conventional wisdom was that machines would approach perfection with endless dry maneuver- ing, usually leading to drawn games. But in my observa- tion, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and ag- gressive. Programs usually re- flect priorities and prejudices of programmers, but because AlphaZero programs itself, Chess, a Drosophila of reasoning Garry Kasparov is the former world chess champion and the author of Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins. He is chairman of the Human Rights Foundation, New York, NY, USA. [email protected] http://science.sciencem Downloaded from Science 2018

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text