Slide 1

Slide 1 text

pandasͰͷOSS׆ಈ ࣄྫͱ࠷ॳͷҰา Masaaki Horikoshi @ ARISE analytics

Slide 2

Slide 2 text

• ງӽ ਅө @ גࣜձࣾ ARISE analytics • OSS׆ಈ • https://github.com/sinhrks ࣗݾ঺հ 867 2017/9/3ݱࡏ 11 106 15 9 2 2 3 77 286 1,879 10,699 ggfortify(R) pandas-ml Github Stars Org. Members (Committers) Contributors

Slide 3

Slide 3 text

ձࣾ঺հ • גࣜձࣾ ARISE analytics • 2017೥4݄ઃཱ • ۀ຿ • ௨৴αʔϏεɺEίϚʔεͳͲʹ͓͚Δσʔλ෼ੳ • ϨίϝϯυΤϯδϯ΍৽نιϦϡʔγϣϯͷ։ൃ

Slide 4

Slide 4 text

໨࣍ • PythonͱՊֶܭࢉ • pandasͰͷOSS׆ಈࣄྫ • OSS׆ಈ ࠷ॳͷҰา

Slide 5

Slide 5 text

PythonͱՊֶܭࢉ

Slide 6

Slide 6 text

Pythonͱ͸ • ൚༻ϓϩάϥϛϯάݴޠ • ۙ೥ɺՊֶܭࢉͰ΋޿͘࢖ΘΕ͍ͯΔ • Google Trends (Python / ೔ຊ)

Slide 7

Slide 7 text

IEEE Spectrum The 2017 Top Programming Language http://spectrum.ieee.org/computing/software/the-2017-top-prog

Slide 8

Slide 8 text

Pythonͱ͸ • Guido van Rossum: • … The first sound bite I had for Python was, "Bridge the gap between the shell and C.” • So I never intended Python to be the primary language for programmers … It was intended to be a second language for people who were already experienced programmers… • We now have a large community of people using Python as an educational language … These people aren't and may never be professional programmers, but they still find some programming skills useful. • Python's Design Goals (A Conversation with Guido van Rossum, Part II) by Bill Venners 2003/01/20

Slide 9

Slide 9 text

ՊֶܭࢉͱPython • Python͕ՊֶܭࢉͰ࢖ΘΕ͍ͯΔཧ༝(ࢲݟ) • ڭҭػؔͰ࢖ΘΕ͍ͯΔ • C/Fortranࢿ࢈ͷར༻͕༰қ • ՊֶܭࢉܥOSS͕ൃలɾ੒ख़

Slide 10

Slide 10 text

Պֶܭࢉύοέʔδ

Slide 11

Slide 11 text

Պֶܭࢉύοέʔδ

Slide 12

Slide 12 text

ՊֶܭࢉܥίϛϡχςΟ • ࠃ಺֎Ͱ༷ʑͳίϛϡχςΟΠϕϯτ͕੝ΜʹߦΘ Ε͍ͯΔ • PyCon • PyData • SciPy • …

Slide 13

Slide 13 text

PyCon JP 2016 ࢀՃऀΞϯέʔτ • Q3. ීஈɺͲͷ෼໺ͰPythonΛ࢖͍ͬͯ·͔͢ʁ 168 127 111 94 55 51 14 12 12 2 0 20 40 60 80 100 120 140 160 180 &! / Private tools development Web+% / Web development #" */machine learning '( / Research & Development )$ / System management Big Data +% / Desktop Application / Computer graphics +% / Game programming https://pycon.jp/2016/ja/files/32/pyconjp2016survey.htm

Slide 14

Slide 14 text

ւ֎ͷՊֶܭࢉܥΠϕϯτ • 2017೥7݄ SciPy 2017ʹࢀՃ • PythonΛ༻͍ͨσʔλ෼ੳɺՊֶݚڀʹؔ͢Δੈք࠷େ ن໛ͷΠϕϯτ • ࢀՃऀ: ໿700໊ • ظؒ: 9೔ؒ (νϡʔτϦΞϧɺεϓϦϯτؚ) • ։࠵৔ॴ: ςΩαεେֶ (ΦʔεςΟϯ)

Slide 15

Slide 15 text

SciPy2017ͷ༷ࢠ • Keynote: Drilling the Chicxulub Impact Structure / Sean Gulick, University of Texas • ڪཽઈ໓͸୯ҰͷᯁੴͷিಥͰͰ͖ͨՄೳੑ͕ߴ͍ͱ݁࿦ • σʔλղੳʹ͸PythonΛར༻

Slide 16

Slide 16 text

pandasͰͷOSS׆ಈࣄྫ

Slide 17

Slide 17 text

ͦ΋ͦ΋ pandas ͬͯԿʁ

Slide 18

Slide 18 text

pandasͱ͸ • σʔλ෼ੳͷͨΊͷߴ଎Ͱ࢖͍΍͍͢σʔλߏ଄Λఏڙ • 2࣍ݩදܗࣜ • RͰ͍͏“data.frame” • Author: Wes McKinney • License: BSD • ޠݯ: PANel DAta System

Slide 19

Slide 19 text

pandasͱ͸ import pandas as pd df = pd.read_csv(‘adult.csv’) df "EVMU%BUBTFUUBLFOGSPN6$*.-3FQPTJUPSZ -JDINBO . 6$*.BDIJOF-FBSOJOH3FQPTJUPSZ*SWJOF $"6OJWFSTJUZPG$BMJGPSOJB 4DIPPMPG*OGPSNBUJPOBOE$PNQVUFS4DJFODF 3FBEDTWpMF $PMVNOT *OEFY .JYFEEBUBUZQFT

Slide 20

Slide 20 text

pandasͱ͸ df[['age', 'marital-status']] df.groupby('income')['hours-per-week'].mean() (SPVQCZ 4FMFDU "HHSFHBUF 4FMFDU

Slide 21

Slide 21 text

ͳͥ pandas ͕ඞཁͳͷʁ SQLͰΑ͘ͳ͍ʁ

Slide 22

Slide 22 text

ͳͥ pandas ͔ • ݱ࣮ͷ(Ԛ͍)σʔλΛɺ௚ײతʹૢ࡞Ͱ͖Δ

Slide 23

Slide 23 text

ͳͥ pandas ͔ • ྫ: খചళͰΩϟϯϖʔϯɾఱީ͕ചΓ্͛ʹ༩͑ΔӨڹΛ஌Γ͍ͨ • σʔλιʔε͕όϥόϥ… ೔࣌ ৔ॴ ఱީ ౦ژ౎໨ࠇ۠ ੖Ε Ωϟϯϖʔϯ໊শ ։࢝೔ ऴྃ೔ 999 ::: Ωϟϯϖʔϯ͸ళฮ͝ͱʹ &YDFMͰه࿥ʜ ఱީσʔλ͸8FC͔Β$47Ͱ μ΢ϯϩʔυ ళฮ*% Ϣʔβ*% Ϩγʔτ*% ߪങ೔࣌ "" BBB ## CCC 104σʔλΛ42-Ͱऔಘʜ

Slide 24

Slide 24 text

ͳͥ pandas ͔ • ྫ: খചళͰΩϟϯϖʔϯɾఱީ͕ചΓ্͛ʹ༩͑ΔӨڹΛ஌Γ͍ͨ • ༷ʑͳߟྀࣄ߲… ళฮ*% Ϣʔβ*% Ϩγʔτ*% ߪങ೔࣌ "" BBB ## CCC ೔࣌ ৔ॴ ఱީ ౦ژ౎໨ࠇ۠ ੖Ε Ωϟϯϖʔϯ໊শ ։࢝೔ ऴྃ೔ 999 ::: ෼୯ҐͰूܭ͍ͨ͠ ৽ฉ޿ࠂͱళ಄νϥγͰ͸ޮ Ռ͕ҧ͏ʁ Ұఆ࣌ؒ͝ͱͷ؍ଌ஋Λ ద౰ͳϧʔϧͰඥ෇͚

Slide 25

Slide 25 text

pandas ͷҐஔ෇͚ CRISP-DM: Cross Industry Standard Process for Data Mining Ϗδωεͷཧղ σʔλͷཧղ σʔλ४උ ϞσϦϯά ධՁ ల։ ݱ࣮ͷσʔλΛཧղ͠ɺ ෼ੳͰ͖Δܗʹམͱ͢

Slide 26

Slide 26 text

pandas ΤίγεςϜ • I/O • pandas-datareader • pandas-msgpack • pandas-gbq • υϝΠϯ • geopandas • xarray • ػցֶश • sklearn-pandas • pandas-ml • ฒྻॲཧ • Dask • ։ൃऀ޲͚ • pandas-compat Լઢ͸pandas/pydataϦϙδτϦʹ͋Δ΋ͷ

Slide 27

Slide 27 text

ίϛολͬͯ ͲΜͳ͜ͱΛͯ͠Δʁ

Slide 28

Slide 28 text

pandas ͰͷOSS׆ಈܦҢ 1ϓϧϦΫΤετ=1ίϛοτӡ༻ Number of Commits ॳΊͯͷissue ·ͱ΋ʹ(?)׆ಈ։࢝ ίΞνʔϜՃೖ

Slide 29

Slide 29 text

ίϛολʹͳΔʹ͸ • ϓϩδΣΫτʹΑͬͯҧ͏͕ɺpandasͷ৔߹: • ࣭ɾྔͱ΋े෼ͳߩݙΛ͍ͯ͠Δ͜ͱ • Ұ೥Ҏ্׆ಈ͍ͯ͠Δ͜ͱ →ݱࡏͷίΞνʔϜϝϯόͷਪનɺ౤ථͰܾఆ

Slide 30

Slide 30 text

ίϛολͷ໾ׂ • Make decisions about: • The overall scope, vision and direction of the project. • Strategic collaborations with other organizations or individuals. • Specific technical issues, features, bugs and pull requests. • The services that are run by the project. • Regular community discussion doesn’t produce consensus.

Slide 31

Slide 31 text

ίϛολͷۀ຿ • Issue΁ͷճ౴ • ίʔυϨϏϡʔ • ϓϧϦΫΤετͷϚʔδ • पลΠϯϑϥ(CI౳)ͷ੔උɺӡ༻ • ϦϦʔε (ϦϦʔεϚωʔδϟ͕࣮ࢪ) • ςϨίϯ

Slide 32

Slide 32 text

ॏཁͳ͜ͱ • ϓϩμΫτͷ඼࣭ • ίϛϡχςΟͷ඼࣭ • ͜ΕΒΛҡ࣋͢ΔͨΊͷ࢓૊Έͮ͘Γ

Slide 33

Slide 33 text

ϓϩμΫτͷ඼࣭ • ࢖͍΍͍͢ • Πϯετʔϧ • ϢʔβϏϦςΟ • υΩϡϝϯτ • ඼࣭͕୲อ͞Ε͍ͯΔ • ܧଓతΠϯςάϨʔγϣϯ • ύϑΥʔϚϯεςετ

Slide 34

Slide 34 text

Πϯετʔϧ • Anaconda • ֤छ wheel ͷఏڙ

Slide 35

Slide 35 text

ϢʔβϏϦςΟ • API • ύοέʔδ಺֎Ͱͷ౷Ұੑ • ໊લۭؒΛద੾ʹ෼ׂ(ΞΫηα) • ޙํޓ׵ੑ • ௚ͪʹมߋͤͣඇਪ঑Խˠ2ϝδϟʔόʔδϣϯޙʹมߋ • աڈʹγϦΞϥΠζ͞ΕͨΦϒδΣΫτͷಡΈࠐΈΛ୲อ

Slide 36

Slide 36 text

υΩϡϝϯτ • ެࣜυΩϡϝϯτ • http://pandas.pydata.org/pandas-docs/stable/ • ॻ੶ • PythonʹΑΔσʔλ෼ੳೖ໳ / Wes McKinney • ༗ࢤυΩϡϝϯτ • Modern pandas / Tom Augspurger • https://tomaugspurger.github.io/modern-1.html

Slide 37

Slide 37 text

ܧଓతΠϯςάϨʔγϣϯ • ੩తνΣοΫ (flake8) • ࣗಈςετ (Travis-CI, AppVeyor, Circle-CI) • Python όʔδϣϯ x पล؀ڥͷ૊Έ߹Θͤ • ΧόϨοδνΣοΫ (Codecov)

Slide 38

Slide 38 text

ύϑΥʔϚϯεςετ • Airspeed Velocity • ࢦఆίϛοτͰςετˍίϛοτؒൺֱ • 1000↑ͷϕϯνϚʔΫΛ࣮ߦ http://pandas.pydata.org/speed/pandas

Slide 39

Slide 39 text

ॏཁͳ͜ͱ • ϓϩμΫτͷ඼࣭ • ίϛϡχςΟͷ඼࣭ • ͜ΕΒΛҡ࣋͢ΔͨΊͷ࢓૊Έͮ͘Γ

Slide 40

Slide 40 text

ίϛϡχςΟͷ඼࣭ • ΦʔϓϯͰ͋Δ͜ͱ • ࢀՃ͠΍͍͢ • αϙʔςΟϒͰ͋Δ

Slide 41

Slide 41 text

ࢀՃ͠΍͘͢͢ΔͨΊʹ • Φʔϓϯͳίϛϡχέʔγϣϯ (GitHub, Gitter) • ߦಈنൣ (Code Of Conduct) • ։ൃऀ޲͚υΩϡϝϯτ(GitHub Wiki)

Slide 42

Slide 42 text

GitHub • దٓλά෇͚͠ɺٞ࿦ʹࢀՃ͠΍͘͢ • IssueςϯϓϨʔτΛར༻͠ɺॳ৺ऀͰ΋Ϩ ϙʔτ͠΍͘͢

Slide 43

Slide 43 text

ߦಈنൣ (Code of Conduct) • ΨόφϯεͷͨΊͷGitHubϦϙδτϦΛ੔උ • https://github.com/pandas-dev/pandas- governance

Slide 44

Slide 44 text

։ൃऀ޲͚υΩϡϝϯτ • ։ൃऀ޲͚ • ίϯτϦϏϡʔγϣϯΨΠυ • Gitͷ࢖͍ํ • ςετํ๏ • ίʔσΟϯάελΠϧ • ϦϦʔεϊʔτͷॻ͖ํ • Specialities • … • ϝϯςφ޲͚ • ϦϦʔε࣌ͷνΣοΫϦετ • υΩϡϝϯτެ։ํ๏ • …

Slide 45

Slide 45 text

։ൃऀ޲͚υΩϡϝϯτ • Specialities • ػೳผͷ༗ࣝऀҰཡ(ίΞνʔϜҎ֎΋ؚ) • ؔ࿈͢Δ Issue / ϓϧϦΫΤετ͔Β௨஌Λ ඈ͹͢ • https://github.com/pandas-dev/pandas/ wiki/Specialities

Slide 46

Slide 46 text

ஂମɾاۀ͔Βͷࢧԉ • • Institutional Partners • Continuum Analytics • Two Sigma

Slide 47

Slide 47 text

OSS׆ಈ ࠷ॳͷҰา

Slide 48

Slide 48 text

OSS׆ಈͷϝϦοτ • ঝೝཉٻ͕ຬͨͤΔ • ༗ࣝऀͷίʔυϨϏϡʔʹΑΓεΩϧΞοϓͰ ͖Δ • ಺෦࣮૷͕ཧղͰ͖ɺޮ཰తͳίʔυ͕ॻ͚Δ • मਖ਼ΛϚελʹऔΓࠐΜͰ΋Β͏͜ͱͰɺݸਓ ͷϝϯςෛՙΛܰݮͰ͖Δ

Slide 49

Slide 49 text

OSS׆ಈΛ࢝ΊΔʹ͸ • ίʔυमਖ਼͚͕ͩOSS׆ಈͰ͸ͳ͍ • એ఻͢Δ (ϒϩάΛॻ͘ɺൃද͢Δ) • σʔλܥͷ৔߹ɺOSSΛ࢖ͬͨ෼ੳϊ΢ϋ΢Ͱ΋ • ࣭໰ (StackOverflow, GitHub Issue)ʹ౴͑Δ • IssueΛॻ͘ • ϓϧϦΫΤετΛૹΔ

Slide 50

Slide 50 text

OSS׆ಈΛ࢝ΊΔʹ͸ • Կʹߩݙ͢Δ͔ΛܾΊΔ • ݸਓϓϩδΣΫτͱͯ͠ • ࣗࣾϓϩμΫτΛOSSͱͯ͠ • ۀ຿Ͱ࢖͏OSSʹରͯ͠ • ༗໊ͳOSSʹରͯ͠

Slide 51

Slide 51 text

ϓϧϦΫΤετ͕ ૹΕΔΑ͏ʹͳΓ͍ͨʂ

Slide 52

Slide 52 text

ॳΊͯͷϓϧϦΫΤετ • ·ͣ͸ҰͭͷϓϩδΣΫτ/ػೳ͔Β • ϓϩηε/πʔϧͷशख़ʹ͸ίετ͕͔͔Δ • ίʔυͷؔ܎ΛಡΈղ͘ͷ͸೉͍͠

Slide 53

Slide 53 text

ॳΊͯͷϓϧϦΫΤετ • ؆୯ͳ΋ͷ͔Β • υΩϡϝϯτͷվగ • Τϥʔϝοηʔδͷվળ • қ͍͠όάमਖ਼

Slide 54

Slide 54 text

ϓϧϦΫΤετͷྲྀΕ • ௚͍ͨ͠ issue Λ୳͢ • issueʹͳ͍৔߹ɺ·ͣissueΛॻ͍ͨํ͕ྑ͍ • ίʔυΛमਖ਼͢Δ • ϓϧϦΫΤετΛૹΔ

Slide 55

Slide 55 text

ϓϧϦΫΤετର৅ͷ୳͠ํ • େ͖ΊͷϓϩμΫτͷ৔߹ɺλάͰ੔ཧ͞Ε͍ͯ Δ • ෼໺΍೉қ౓Ͱର৅Λ୳͢ • ଞͷϢʔβ͕ணख͍ͯ͠Δ৔߹͸ආ͚ͨํ͕ྑ͍

Slide 56

Slide 56 text

ϓϧϦΫΤετͷૹΓํ • ࠷௿ݶͷϧʔϧΛकΔ • ໌ࣔ͞Ε͍ͯΔ΋ͷ͚ͩͰྑ͍ • ςετΛ௥Ճ͢Δ • ίʔσΟϯάن໿ΛकΔ • …

Slide 57

Slide 57 text

ϓϧϦΫΤετΛૹͬͨޙ • ։ൃऀ΍ଞϢʔβʹΑΔϨϏϡʔ • (ϓϩδΣΫτʹΑΓ)υΩϡϝϯτɺϦϦʔεϊʔτͷ੔උ • Ϛʔδ

Slide 58

Slide 58 text

ϋʔυϧʹײ͡ΔཁҼ • ӳޠ • ίϛϡχςΟϩʔΧϧϧʔϧ • Gitͷ࢖͍ํ • ٕज़ྗ • Կ͔໰୊͕͋Ε͹ڭ͑ͯ͘ΕΔͷͰɺա౓ʹؾΛ࢖͏ඞཁ͸ͳ͍

Slide 59

Slide 59 text

ϓϧϦΫΤετʹ͸ ͲΜͳ΋ͷ͕͋Δʁ

Slide 60

Slide 60 text

ϓϧϦΫΤετͷछྨ • υΩϡϝϯτमਖ਼ • ෆ۩߹मਖ਼ • ػೳ௥ՃɺAPIมߋ • ϦϑΝΫλϦϯά • ςετ௥Ճ • …

Slide 61

Slide 61 text

ϓϧϦΫΤετࣄྫ (1/3) υΩϡϝϯτमਖ਼ • मਖ਼಺༰Λ؆ܿʹهࡌ https://github.com/pandas-dev/pandas/pull/13312 ͜ͷυΩϡϝϯτɺͳΜ͔յΕͯΔͬΆ͍

Slide 62

Slide 62 text

ϓϧϦΫΤετࣄྫ (1/3) υΩϡϝϯτमਖ਼ ۭനߦΛೖΕ͚ͨͩ

Slide 63

Slide 63 text

• ෳࡶͳมߋͷ৔߹͸मਖ਼લޙͷڍಈΛ੔ཧ͢ Δ ࠓͷڍಈ͸ʜ͕ͩ मਖ਼ޙͷڍಈ͸ʜʹͳΔɻ ͜Ε͸ʜͱ͍͏ϧʔϧʹج͍͍ͮͯͯʜ https://github.com/pandas-dev/pandas/pull/13849 ϓϧϦΫΤετࣄྫ (2/3) ࢓༷มߋ

Slide 64

Slide 64 text

• ϨϏϡʔࢦఠΛड͚ͭͭɺ࢓༷ΛݻΊΔ 999ͷ࣌ʹΤϥʔग़ͨ͠ํ͕Α͘ͳ͍ʁ ͦ͏ͳΜ͚ͩͲɺ999Λېࢭ͢Δͱผͷػೳ͕ಈ͔ͳ͘ͳΔ ͡Ό͋࢓ํͳ͍ͶɻଞͷΈΜͳ͸Ͳ͏ࢥ͏ʁ Ҏ߱ٞ࿦ ϓϧϦΫΤετࣄྫ (2/3) ࢓༷มߋ

Slide 65

Slide 65 text

• ϦϦʔεϊʔτ΍υΩϡϝϯτΛमਖ਼͢Δ ϓϧϦΫΤετࣄྫ (2/3) ࢓༷มߋ

Slide 66

Slide 66 text

ϓϧϦΫΤετࣄྫ (3/3) όάमਖ਼ • ҙਤ͕఻ΘΒͳ͚Ε͹ίʔυΛࣔ͢ https://github.com/pandas-dev/pandas/pull/13312 ͍΍ɺผͷͱ͜௚ͨ͠ํ͕͍͍Μ͡Όͳ͍ʁ ӳޠ͕͓͔͍͔͠΋ʜɻ मਖ਼ՕॴΛϓϧϦΫͰૹͬͨͷͰ֬ೝͯ͠ ࠓͷڍಈ͸ʜͰ मਖ਼ޙͷڍಈ͸ʜʹͳΔɻ ͨͿΜʜͷॲཧΛमਖ਼͢Ε͹͍͍

Slide 67

Slide 67 text

ϓϧϦΫΤετࣄྫ (3/3) όάमਖ਼ • (Өڹൣғ͕େ͖͍৔߹)ͻͨ͢ΒςετΛ଍͢ ҟͳΔΫϥε͕ಉ͡ڍಈʹͳ Δ͔νΣοΫ

Slide 68

Slide 68 text

ϓϧϦΫΤετࣄྫ (3/3) όάमਖ਼ • (Өڹൣғ͕େ͖͍৔߹)ͻͨ͢ΒςετΛ଍͢ ܕͷ૊Έ߹ΘͤΛ໢ཏతʹ νΣοΫ ͦΕͧΕͷܕͷ૊Έ߹Θ͕ͤ /VN1Zͱಉ͡ڍಈʹͳΔ͔ νΣοΫ

Slide 69

Slide 69 text

ؾΛ͚͍ͭͯΔ͜ͱ • ίϛϡχέʔγϣϯ • ӳޠ͸௨͡ͳ͍͔΋͠Εͳ͍͕ɺίʔυͳΒ఻ΘΔ • ίʔυमਖ਼ • ہॴతͳमਖ਼Λආ͚Δ • ςετΛͪΌΜͱॻ͘ • ࣗ෼Ҏ֎͕ͦͷίʔυΛ৮ͬͨࡍ΋όάͷຒΊࠐΈΛ༧๷Ͱ͖Δ͔ʁ • υΩϡϝϯτ΋Ͱ͖Δ͚ͩॻ͘ • (࢓༷͕͔ͬ͠Γ఻ΘΕ͹)୭͔௚ͯ͘͠ΕΔ

Slide 70

Slide 70 text

΍ͬͺΓͪΐͬͱ େมͦ͏…ʁ

Slide 71

Slide 71 text

ࠓ೔͔ΒͰ͖Δ͜ͱ • ؾʹͳΔϓϩδΣΫτΛWatchͯ͠ΈΔ • ίϛϡχςΟͷงғؾ͕Θ͔Δ • ໘നͦ͏ͳ Issue / ϓϧϦΫΤετ͕͋Δ͔΋ʁ

Slide 72

Slide 72 text

·ͱΊ

Slide 73

Slide 73 text

·ͱΊ • PythonͱՊֶܭࢉ • pandasͰͷOSS׆ಈࣄྫ • OSS׆ಈ ࠷ॳͷҰา • PythonՊֶܭࢉίϛϡχςΟʹࢀՃͯ͠ΈΑ͏ • ؾʹͳΔϓϩδΣΫτΛWatchͯ͠ΈΑ͏