pandasでのOSS活動事例

22f56e55955b9aa693081ed5dc6400ae?s=47 Sinhrks
May 19, 2017
590

 pandasでのOSS活動事例

Stapy x 理研AIP オープンソース研究会
https://startpython.connpass.com/event/55579/

22f56e55955b9aa693081ed5dc6400ae?s=128

Sinhrks

May 19, 2017
Tweet

Transcript

  1. pandasͰͷ OSS׆ಈࣄྫ Masaaki Horikoshi @ ARISE analytics

  2. OSS׆ಈ঺հ 767 2017/5/17ݱࡏ 10 94 14 9 2 2 3

    65 265 1,535 9,608 ggfortify(R) pandas-ml Github Stars Org. Members (Committers) Contributors
  3. ໨࣍ • pandasͰͷ׆ಈ • ϓϧϦΫΤετࣄྫ • OSS׆ಈ͸͡ΊͷҰา

  4. pandasͱ͸ • σʔλ෼ੳͷͨΊͷߴ଎Ͱ࢖͍΍͍͢σʔλߏ଄Λఏڙ • 2࣍ݩදܗࣜ • RͰ͍͏“data.frame” • Author: Wes

    McKinney • License: BSD • ޠݯ: PANel DAta System
  5. pandasͱ͸ import pandas as pd df = pd.read_csv(‘adult.csv’) df "EVMU%BUBTFUUBLFOGSPN6$*.-3FQPTJUPSZ

    -JDINBO .  6$*.BDIJOF-FBSOJOH3FQPTJUPSZ<IUUQBSDIJWFJDTVDJFEVNM>*SWJOF $"6OJWFSTJUZPG$BMJGPSOJB 4DIPPMPG*OGPSNBUJPOBOE$PNQVUFS4DJFODF 3FBEDTWpMF $PMVNOT *OEFY .JYFEEBUBUZQFT
  6. pandasͱ͸ df[['age', 'marital-status']] df.groupby('income')['hours-per-week'].mean() (SPVQCZ 4FMFDU "HHSFHBUF 4FMFDU

  7. PyData Ecosystem (Scipy stack) #PLFI NBUQMPUMJC 4DJLJUMFBSO 4UBUTNPEFM /VN1Z 1Z5BCMFT

    42-"MDIFNZ *CJT 4DJ1Z 1Z4QBSL #MB[F%BTL +VQZUFS QBOEBT 6TFS*OUFSGBDF 7JTVBMJ[BUJPO #JH%BUB *0 $PNQVUBUJPO .BDIJOF-FBSOJOH 4UBUJTUJDT SQZ 0UIFS1SPHSBNNJOH -BOHVBHFT
  8. pandasͰͷ׆ಈ

  9. ܦҢ 1ϓϧϦΫΤετ=1ίϛοτӡ༻ Number of Commits ॳΊͯͷissue ·ͱ΋ʹ(?)׆ಈ։࢝ ίΞνʔϜՃೖ

  10. ίϛολʹͳΔʹ͸ • ϓϩδΣΫτʹΑͬͯҧ͏͕ɺpandasͷ৔߹: • ࣭ྔͱ΋े෼ͳߩݙΛ͍ͯ͠Δ͜ͱ • Ұ೥Ҏ্׆ಈ͍ͯ͠Δ͜ͱ →ݱࡏͷίΞνʔϜϝϯόͷਪનɺ౤ථͰܾఆ

  11. ίϛολͷ໾ׂ • Make decisions about: • The overall scope, vision

    and direction of the project. • Strategic collaborations with other organizations or individuals. • Specific technical issues, features, bugs and pull requests. • The services that are run by the project. • Regular community discussion doesn’t produce consensus.
  12. ίϛολͷۀ຿ • Issue΁ͷճ౴ • ίʔυϨϏϡʔ (౰વɺίϛολҎ֎ͷίʔυϨϏϡʔ΋׻ܴ) • ϓϧϦΫΤετͷϚʔδ • पลΠϯϑϥ(CI౳)ͷ੔උɺӡ༻

    • ϦϦʔε (ϦϦʔεϚωʔδϟ͕࣮ࢪ) • ςϨίϯ
  13. ॏཁͳ͜ͱ • ϓϩμΫτͷ඼࣭ • ίϛϡχςΟͷ࣭ →ͷͨΊͷ࢓૊Έͮ͘Γ

  14. ॏཁͳ͜ͱ ˒ ϓϩμΫτͷ඼࣭ • ίϛϡχςΟͷ࣭ →ͷͨΊͷ࢓૊Έͮ͘Γ

  15. ϓϩμΫτͷ඼࣭ • ࢖͍΍͍͢͜ͱ • Πϯετʔϧ͠΍͍͢ • API͕Θ͔Γ΍͍͢ • υΩϡϝϯτ͕Θ͔Γ΍͍͢

  16. ϓϩμΫτͷ඼࣭ • ඼࣭͕୲อ͞Ε͍ͯΔ͜ͱ • ෆ۩߹͕গͳ͍ • ύϑΥʔϚϯε͕ྑ͍

  17. ࢖͍΍͘͢͢ΔͨΊʹ • Πϯετʔϧ͠΍͘͢ • ɹ • ֤छwheelఏڙ • APIΛΘ͔Γ΍͘͢ •

    issue্ͰͷσΟεΧογϣϯ • ίʔυϨϏϡʔ • υΩϡϝϯτΛΘ͔Γ΍͘͢ • υΩϡϝϯτϨϏϡʔ • Travis-CI Ͱͷdev൛υΩϡϝϯτϏϧυ & νΣοΫ
  18. ֤छwheelఏڙ

  19. υΩϡϝϯτ • ެࣜυΩϡϝϯτ • http://pandas.pydata.org/pandas-docs/stable/ • ॻ੶ • PythonʹΑΔσʔλ෼ੳೖ໳ /

    Wes McKinney • ༗ࢤυΩϡϝϯτ • Modern pandas / Tom Augspurger • https://tomaugspurger.github.io/modern-1.html
  20. ඼࣭୲อͷͨΊʹ • ෆ۩߹Λগͳ͘ • ੩తνΣοΫ (flake8) • ࣗಈςετ (Travis-CI, Appveyor,

    Circle-CI) • ΧόϨοδνΣοΫ (Codecov) • ύϑΥʔϚϯεΛྑ͘ • ύϑΥʔϚϯεςετ (Airspeed Velocity)
  21. ࣗಈςετ • Travis-CI • Python όʔδϣϯ x पล؀ڥͷ૊Έ߹Θͤ

  22. ࣗಈςετ • AppVeyor • Windows؀ڥͰͷςετʹར༻ • Circle-CI • Travis-CIͷิ׬తʹར༻

  23. ύϑΥʔϚϯεςετ • Airspeed Velocity • ࢦఆίϛοτͰςετˍίϛοτؒൺֱ • https://github.com/spacetelescope/asv All benchmarks:

    before after ratio [5049b5 ] [53ac28 ] 293.20ns 290.10ns 0.99 attrs_caching.getattr_dataframe_index.time_getattr_dataframe_index 3.13μs 3.08μs 0.98 attrs_caching.setattr_dataframe_index.time_setattr_dataframe_index 7.45ms 7.23ms 0.97 binary_ops.frame_add.time_frame_add 4.14ms 4.09ms 0.99 binary_ops.frame_add_no_ne.time_frame_add_no_ne 4.28ms 4.40ms 1.03 binary_ops.frame_add_st.time_frame_add_st 21.67ms 21.58ms 1.00 binary_ops.frame_float_div.time_frame_float_div 5.74ms 5.84ms 1.02 binary_ops.frame_float_div_by_zero.time_frame_float_div_by_zero 17.90ms 17.81ms 0.99 binary_ops.frame_float_floor_by_zero.time_frame_float_floor_by_zero 10.49ms 9.97ms 0.95 binary_ops.frame_float_mod.time_frame_float_mod 5.95ms 6.14ms 1.03 binary_ops.frame_int_div_by_zero.time_frame_int_div_by_zero 10.64ms 10.64ms 1.00 binary_ops.frame_int_mod.time_frame_int_mod 7.26ms 7.31ms 1.01 binary_ops.frame_mult.time_frame_mult 4.14ms 4.10ms 0.99 binary_ops.frame_mult_no_ne.time_frame_mult_no_ne
  24. ॏཁͳ͜ͱ • ϓϩμΫτͷ඼࣭ ˒ ίϛϡχςΟͷ࣭ →ͷͨΊͷ࢓૊Έͮ͘Γ

  25. ίϛϡχςΟͷ࣭ • ΦʔϓϯͰ͋Δ͜ͱ • ࢀՃ͠΍͍͢ • αϙʔςΟϒͰ͋Δ

  26. ࢀՃ͠΍͘͢͢ΔͨΊʹ • ίϥϘϨʔγϣϯπʔϧ • GitHub • Gitter • υΩϡϝϯςʔγϣϯ •

    Code of Conduct • ։ൃऀ޲͚υΩϡϝϯτ(Wiki)
  27. GitHub • దٓλά෇͚͠ɺٞ࿦ʹࢀՃ͠΍͘͢ • IssueςϯϓϨʔτΛར༻͠ɺॳ৺ऀͰ΋Ϩ ϙʔτ͠΍͘͢

  28. Code of Conduct • ΨόφϯεͷͨΊͷGitHubϦϙδτϦΛ੔උ • https://github.com/pandas-dev/pandas- governance

  29. ։ൃऀ޲͚υΩϡϝϯτ • ։ൃऀ޲͚ • ίϯτϦϏϡʔγϣϯΨΠυ • Gitͷ࢖͍ํ • ςετํ๏ •

    ίʔυελΠϧ • ϦϦʔεϊʔτͷॻ͖ํ • Specialities • … • ϝϯςφ޲͚ • ϦϦʔε࣌ͷνΣοΫϦετ • υΩϡϝϯτެ։ํ๏ • …
  30. νʔϜϫʔΫ • Specialities • ػೳผͷ༗ࣝऀҰཡ(ίΞνʔϜҎ֎΋ؚ) • ؔ࿈͢Δ Issue / ϓϧϦΫΤετ͔Β௨஌Λ

    ඈ͹͢ • https://github.com/pandas-dev/pandas/ wiki/Specialities
  31. ஂମɾاۀ͔Βͷࢧԉ • • Institutional Partners • Continuum Analytics • Two

    Sigma
  32. ϓϧϦΫΤετࣄྫ

  33. ϓϧϦΫΤετͷछྨ • υΩϡϝϯτमਖ਼ • ෆ۩߹मਖ਼ • ػೳ௥ՃɺAPIมߋ • ϦϑΝΫλϦϯά •

    ςετ௥Ճ • …
  34. ϓϧϦΫΤετࣄྫ • υΩϡϝϯτमਖ਼ • https://github.com/pandas-dev/pandas/pull/13312 • APIมߋ • https://github.com/pandas-dev/pandas/issues/6511 •

    ςετվળ • https://github.com/pandas-dev/pandas/issues/10373
  35. OSS׆ಈ͸͡ΊͷҰา

  36. OSS׆ಈͷϝϦοτ • ঝೝཉٻ͕ຬͨͤΔ • ༗ࣝऀͷίʔυϨϏϡʔʹΑΓεΩϧΞοϓͰ ͖Δ • ಺෦࣮૷͕ཧղͰ͖ɺޮ཰తͳίʔυ͕ॻ͚Δ • मਖ਼ΛϚελʹऔΓࠐΜͰ΋Β͏͜ͱͰɺݸਓ

    ͰͷϝϯςෛՙΛܰݮͰ͖Δ
  37. OSS׆ಈΛ࢝ΊΔʹ͸ • Կʹߩݙ͢Δ͔ΛܾΊΔ • ݸਓͰ࢝ΊΔ • ࣗࣾϓϩμΫτΛOSSͱͯ͠ެ։͢Δ • ۀ຿Ͱ࢖͏OSSʹߩݙ͢Δ •

    ༗໊ͳOSSʹߩݙ͢Δ
  38. OSS׆ಈΛ࢝ΊΔʹ͸ • ͲͷΑ͏ʹߩݙ͢Δ͔ΛܾΊΔ • એ఻͢Δ (ϒϩάΛॻ͘ɺൃද͢Δ) • σʔλܥͷ৔߹ɺOSSΛ࢖ͬͨ෼ੳϊ΢ϋ΢Ͱ΋ • ࣭໰

    (StackOverflow, GitHub Issue)ʹ౴͑Δ • IssueΛॻ͘ • ϓϧϦΫΤετΛૹΔ
  39. ॳΊͯͷϓϧϦΫΤετ • ·ͣ͸ҰͭͷϓϩδΣΫτ/ػೳ͔Β • ϓϩηε/πʔϧͷशख़ʹ͸ίετ͕͔͔Δ • ίʔυͷؔ܎ΛಡΈղ͘ͷ͸೉͍͠

  40. ॳΊͯͷϓϧϦΫΤετ • ؆୯ͳ΋ͷ͔Β • υΩϡϝϯτͷվగ • Τϥʔϝοηʔδͷվળ • қ͍͠Issueͷमਖ਼

  41. ࠓ೔͔ΒͰ͖Δ͜ͱ • ؾʹͳΔϓϩδΣΫτΛWatchͯ͠ΈΔ • ໘നͦ͏ͳ Issue / ϓϧϦΫΤετ͕͋Δ͔ ΋ʁ