Slide 1

Slide 1 text

pandasͰͷ OSS׆ಈࣄྫ Masaaki Horikoshi @ ARISE analytics

Slide 2

Slide 2 text

OSS׆ಈ঺հ 767 2017/5/17ݱࡏ 10 94 14 9 2 2 3 65 265 1,535 9,608 ggfortify(R) pandas-ml Github Stars Org. Members (Committers) Contributors

Slide 3

Slide 3 text

໨࣍ • pandasͰͷ׆ಈ • ϓϧϦΫΤετࣄྫ • OSS׆ಈ͸͡ΊͷҰา

Slide 4

Slide 4 text

pandasͱ͸ • σʔλ෼ੳͷͨΊͷߴ଎Ͱ࢖͍΍͍͢σʔλߏ଄Λఏڙ • 2࣍ݩදܗࣜ • RͰ͍͏“data.frame” • Author: Wes McKinney • License: BSD • ޠݯ: PANel DAta System

Slide 5

Slide 5 text

pandasͱ͸ import pandas as pd df = pd.read_csv(‘adult.csv’) df "EVMU%BUBTFUUBLFOGSPN6$*.-3FQPTJUPSZ -JDINBO . 6$*.BDIJOF-FBSOJOH3FQPTJUPSZ*SWJOF $"6OJWFSTJUZPG$BMJGPSOJB 4DIPPMPG*OGPSNBUJPOBOE$PNQVUFS4DJFODF 3FBEDTWpMF $PMVNOT *OEFY .JYFEEBUBUZQFT

Slide 6

Slide 6 text

pandasͱ͸ df[['age', 'marital-status']] df.groupby('income')['hours-per-week'].mean() (SPVQCZ 4FMFDU "HHSFHBUF 4FMFDU

Slide 7

Slide 7 text

PyData Ecosystem (Scipy stack) #PLFI NBUQMPUMJC 4DJLJUMFBSO 4UBUTNPEFM /VN1Z 1Z5BCMFT 42-"MDIFNZ *CJT 4DJ1Z 1Z4QBSL #MB[F%BTL +VQZUFS QBOEBT 6TFS*OUFSGBDF 7JTVBMJ[BUJPO #JH%BUB *0 $PNQVUBUJPO .BDIJOF-FBSOJOH 4UBUJTUJDT SQZ 0UIFS1SPHSBNNJOH -BOHVBHFT

Slide 8

Slide 8 text

pandasͰͷ׆ಈ

Slide 9

Slide 9 text

ܦҢ 1ϓϧϦΫΤετ=1ίϛοτӡ༻ Number of Commits ॳΊͯͷissue ·ͱ΋ʹ(?)׆ಈ։࢝ ίΞνʔϜՃೖ

Slide 10

Slide 10 text

ίϛολʹͳΔʹ͸ • ϓϩδΣΫτʹΑͬͯҧ͏͕ɺpandasͷ৔߹: • ࣭ྔͱ΋े෼ͳߩݙΛ͍ͯ͠Δ͜ͱ • Ұ೥Ҏ্׆ಈ͍ͯ͠Δ͜ͱ →ݱࡏͷίΞνʔϜϝϯόͷਪનɺ౤ථͰܾఆ

Slide 11

Slide 11 text

ίϛολͷ໾ׂ • Make decisions about: • The overall scope, vision and direction of the project. • Strategic collaborations with other organizations or individuals. • Specific technical issues, features, bugs and pull requests. • The services that are run by the project. • Regular community discussion doesn’t produce consensus.

Slide 12

Slide 12 text

ίϛολͷۀ຿ • Issue΁ͷճ౴ • ίʔυϨϏϡʔ (౰વɺίϛολҎ֎ͷίʔυϨϏϡʔ΋׻ܴ) • ϓϧϦΫΤετͷϚʔδ • पลΠϯϑϥ(CI౳)ͷ੔උɺӡ༻ • ϦϦʔε (ϦϦʔεϚωʔδϟ͕࣮ࢪ) • ςϨίϯ

Slide 13

Slide 13 text

ॏཁͳ͜ͱ • ϓϩμΫτͷ඼࣭ • ίϛϡχςΟͷ࣭ →ͷͨΊͷ࢓૊Έͮ͘Γ

Slide 14

Slide 14 text

ॏཁͳ͜ͱ ˒ ϓϩμΫτͷ඼࣭ • ίϛϡχςΟͷ࣭ →ͷͨΊͷ࢓૊Έͮ͘Γ

Slide 15

Slide 15 text

ϓϩμΫτͷ඼࣭ • ࢖͍΍͍͢͜ͱ • Πϯετʔϧ͠΍͍͢ • API͕Θ͔Γ΍͍͢ • υΩϡϝϯτ͕Θ͔Γ΍͍͢

Slide 16

Slide 16 text

ϓϩμΫτͷ඼࣭ • ඼࣭͕୲อ͞Ε͍ͯΔ͜ͱ • ෆ۩߹͕গͳ͍ • ύϑΥʔϚϯε͕ྑ͍

Slide 17

Slide 17 text

࢖͍΍͘͢͢ΔͨΊʹ • Πϯετʔϧ͠΍͘͢ • ɹ • ֤छwheelఏڙ • APIΛΘ͔Γ΍͘͢ • issue্ͰͷσΟεΧογϣϯ • ίʔυϨϏϡʔ • υΩϡϝϯτΛΘ͔Γ΍͘͢ • υΩϡϝϯτϨϏϡʔ • Travis-CI Ͱͷdev൛υΩϡϝϯτϏϧυ & νΣοΫ

Slide 18

Slide 18 text

֤छwheelఏڙ

Slide 19

Slide 19 text

υΩϡϝϯτ • ެࣜυΩϡϝϯτ • http://pandas.pydata.org/pandas-docs/stable/ • ॻ੶ • PythonʹΑΔσʔλ෼ੳೖ໳ / Wes McKinney • ༗ࢤυΩϡϝϯτ • Modern pandas / Tom Augspurger • https://tomaugspurger.github.io/modern-1.html

Slide 20

Slide 20 text

඼࣭୲อͷͨΊʹ • ෆ۩߹Λগͳ͘ • ੩తνΣοΫ (flake8) • ࣗಈςετ (Travis-CI, Appveyor, Circle-CI) • ΧόϨοδνΣοΫ (Codecov) • ύϑΥʔϚϯεΛྑ͘ • ύϑΥʔϚϯεςετ (Airspeed Velocity)

Slide 21

Slide 21 text

ࣗಈςετ • Travis-CI • Python όʔδϣϯ x पล؀ڥͷ૊Έ߹Θͤ

Slide 22

Slide 22 text

ࣗಈςετ • AppVeyor • Windows؀ڥͰͷςετʹར༻ • Circle-CI • Travis-CIͷิ׬తʹར༻

Slide 23

Slide 23 text

ύϑΥʔϚϯεςετ • Airspeed Velocity • ࢦఆίϛοτͰςετˍίϛοτؒൺֱ • https://github.com/spacetelescope/asv All benchmarks: before after ratio [5049b5 ] [53ac28 ] 293.20ns 290.10ns 0.99 attrs_caching.getattr_dataframe_index.time_getattr_dataframe_index 3.13μs 3.08μs 0.98 attrs_caching.setattr_dataframe_index.time_setattr_dataframe_index 7.45ms 7.23ms 0.97 binary_ops.frame_add.time_frame_add 4.14ms 4.09ms 0.99 binary_ops.frame_add_no_ne.time_frame_add_no_ne 4.28ms 4.40ms 1.03 binary_ops.frame_add_st.time_frame_add_st 21.67ms 21.58ms 1.00 binary_ops.frame_float_div.time_frame_float_div 5.74ms 5.84ms 1.02 binary_ops.frame_float_div_by_zero.time_frame_float_div_by_zero 17.90ms 17.81ms 0.99 binary_ops.frame_float_floor_by_zero.time_frame_float_floor_by_zero 10.49ms 9.97ms 0.95 binary_ops.frame_float_mod.time_frame_float_mod 5.95ms 6.14ms 1.03 binary_ops.frame_int_div_by_zero.time_frame_int_div_by_zero 10.64ms 10.64ms 1.00 binary_ops.frame_int_mod.time_frame_int_mod 7.26ms 7.31ms 1.01 binary_ops.frame_mult.time_frame_mult 4.14ms 4.10ms 0.99 binary_ops.frame_mult_no_ne.time_frame_mult_no_ne

Slide 24

Slide 24 text

ॏཁͳ͜ͱ • ϓϩμΫτͷ඼࣭ ˒ ίϛϡχςΟͷ࣭ →ͷͨΊͷ࢓૊Έͮ͘Γ

Slide 25

Slide 25 text

ίϛϡχςΟͷ࣭ • ΦʔϓϯͰ͋Δ͜ͱ • ࢀՃ͠΍͍͢ • αϙʔςΟϒͰ͋Δ

Slide 26

Slide 26 text

ࢀՃ͠΍͘͢͢ΔͨΊʹ • ίϥϘϨʔγϣϯπʔϧ • GitHub • Gitter • υΩϡϝϯςʔγϣϯ • Code of Conduct • ։ൃऀ޲͚υΩϡϝϯτ(Wiki)

Slide 27

Slide 27 text

GitHub • దٓλά෇͚͠ɺٞ࿦ʹࢀՃ͠΍͘͢ • IssueςϯϓϨʔτΛར༻͠ɺॳ৺ऀͰ΋Ϩ ϙʔτ͠΍͘͢

Slide 28

Slide 28 text

Code of Conduct • ΨόφϯεͷͨΊͷGitHubϦϙδτϦΛ੔උ • https://github.com/pandas-dev/pandas- governance

Slide 29

Slide 29 text

։ൃऀ޲͚υΩϡϝϯτ • ։ൃऀ޲͚ • ίϯτϦϏϡʔγϣϯΨΠυ • Gitͷ࢖͍ํ • ςετํ๏ • ίʔυελΠϧ • ϦϦʔεϊʔτͷॻ͖ํ • Specialities • … • ϝϯςφ޲͚ • ϦϦʔε࣌ͷνΣοΫϦετ • υΩϡϝϯτެ։ํ๏ • …

Slide 30

Slide 30 text

νʔϜϫʔΫ • Specialities • ػೳผͷ༗ࣝऀҰཡ(ίΞνʔϜҎ֎΋ؚ) • ؔ࿈͢Δ Issue / ϓϧϦΫΤετ͔Β௨஌Λ ඈ͹͢ • https://github.com/pandas-dev/pandas/ wiki/Specialities

Slide 31

Slide 31 text

ஂମɾاۀ͔Βͷࢧԉ • • Institutional Partners • Continuum Analytics • Two Sigma

Slide 32

Slide 32 text

ϓϧϦΫΤετࣄྫ

Slide 33

Slide 33 text

ϓϧϦΫΤετͷछྨ • υΩϡϝϯτमਖ਼ • ෆ۩߹मਖ਼ • ػೳ௥ՃɺAPIมߋ • ϦϑΝΫλϦϯά • ςετ௥Ճ • …

Slide 34

Slide 34 text

ϓϧϦΫΤετࣄྫ • υΩϡϝϯτमਖ਼ • https://github.com/pandas-dev/pandas/pull/13312 • APIมߋ • https://github.com/pandas-dev/pandas/issues/6511 • ςετվળ • https://github.com/pandas-dev/pandas/issues/10373

Slide 35

Slide 35 text

OSS׆ಈ͸͡ΊͷҰา

Slide 36

Slide 36 text

OSS׆ಈͷϝϦοτ • ঝೝཉٻ͕ຬͨͤΔ • ༗ࣝऀͷίʔυϨϏϡʔʹΑΓεΩϧΞοϓͰ ͖Δ • ಺෦࣮૷͕ཧղͰ͖ɺޮ཰తͳίʔυ͕ॻ͚Δ • मਖ਼ΛϚελʹऔΓࠐΜͰ΋Β͏͜ͱͰɺݸਓ ͰͷϝϯςෛՙΛܰݮͰ͖Δ

Slide 37

Slide 37 text

OSS׆ಈΛ࢝ΊΔʹ͸ • Կʹߩݙ͢Δ͔ΛܾΊΔ • ݸਓͰ࢝ΊΔ • ࣗࣾϓϩμΫτΛOSSͱͯ͠ެ։͢Δ • ۀ຿Ͱ࢖͏OSSʹߩݙ͢Δ • ༗໊ͳOSSʹߩݙ͢Δ

Slide 38

Slide 38 text

OSS׆ಈΛ࢝ΊΔʹ͸ • ͲͷΑ͏ʹߩݙ͢Δ͔ΛܾΊΔ • એ఻͢Δ (ϒϩάΛॻ͘ɺൃද͢Δ) • σʔλܥͷ৔߹ɺOSSΛ࢖ͬͨ෼ੳϊ΢ϋ΢Ͱ΋ • ࣭໰ (StackOverflow, GitHub Issue)ʹ౴͑Δ • IssueΛॻ͘ • ϓϧϦΫΤετΛૹΔ

Slide 39

Slide 39 text

ॳΊͯͷϓϧϦΫΤετ • ·ͣ͸ҰͭͷϓϩδΣΫτ/ػೳ͔Β • ϓϩηε/πʔϧͷशख़ʹ͸ίετ͕͔͔Δ • ίʔυͷؔ܎ΛಡΈղ͘ͷ͸೉͍͠

Slide 40

Slide 40 text

ॳΊͯͷϓϧϦΫΤετ • ؆୯ͳ΋ͷ͔Β • υΩϡϝϯτͷվగ • Τϥʔϝοηʔδͷվળ • қ͍͠Issueͷमਖ਼

Slide 41

Slide 41 text

ࠓ೔͔ΒͰ͖Δ͜ͱ • ؾʹͳΔϓϩδΣΫτΛWatchͯ͠ΈΔ • ໘നͦ͏ͳ Issue / ϓϧϦΫΤετ͕͋Δ͔ ΋ʁ