Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maven central repository の artifact をランキングする #渋谷java

KOMIYA Atsushi
September 30, 2017

Maven central repository の artifact をランキングする #渋谷java

第二十回 #渋谷java の発表資料です。Maven central repository 上の artifact を PageRank を使ってランキングしてみる、というお話です。
https://shibuya-java.connpass.com/event/65433/

KOMIYA Atsushi

September 30, 2017
Tweet

More Decks by KOMIYA Atsushi

Other Decks in Programming

Transcript

  1. Maven central repository ͷ
    artifact ΛϥϯΩϯά͢Δ
    ौ୩Java #20 2017-09-30
    KOMIYA Atsushi

    View full-size slide

  2. @komiya_atsushi

    View full-size slide

  3. Today’s topic

    View full-size slide

  4. ΈΜͳ ͍͖ͩ͢
    Maven central repository
    ͷ artifact ΛϥϯΩϯάͯ͠ΈΔ

    View full-size slide

  5. Artifact ΛϥϯΩϯά͢Δ
    • Maven central repository ্ͷ artifact ͸

    (όʔδϣϯҧ͍Λআ͍ͯ) 20 ສҎ্ଘࡏ͢Δ
    • ΞϓϦέʔγϣϯʹ૊ΈࠐΉϥΠϒϥϦΛબఆ

    ͢Δࡍ͸ɺར༻࣮੷ͷ͋ΔϥΠϒϥϦΛબͼ͍ͨ
    • ௕͍΋ͷʹ͸ר͔Εͯੜ͖͍͖͍ͯͨੑ෼
    • Artifact ʹର͢ΔϥϯΩϯά͕ཉ͍͠ʂ

    View full-size slide

  6. http://search.maven.org/#stats

    View full-size slide

  7. Ͳ͏΍ͬͯ artifact Λ
    ϥϯΩϯά͢Δͷ͔ʁ

    View full-size slide

  8. Artifact ͷґଘؔ܎ʹண໨͢Δ
    • ʮ͞·͟·ͳ artifact ʹґଘ͞Ε͍ͯΔ artifact ΄
    Ͳɺॏཁͳ artifact Ͱ͋Δʯͱ͍͏ԾઆΛߟ͑Δ
    • Artifact ͝ͱͷʮඃࢀর਺ʯ Λࢦඪ஋ͱ͢Δํ๏͕
    ߟ͑ΒΕΔ
    • ࿦จͷʮඃҾ༻਺ʯతͳߟ͑ํ
    • ୯७ͳʮඃࢀর਺ʯΑΓΑ͍ࢦඪ஋͸ͳ͍͔ʁ

    View full-size slide

  9. ґଘؔ܎͸
    ༗޲άϥϑͰ
    දݱͰ͖Δ

    View full-size slide

  10. PageRank
    • Google ͷΞϨ
    • Web ϖʔδͷϦϯΫؔ܎͔Βϖʔδͷॏཁ౓Λଌఆ
    • ϦϯΫؔ܎͸༗޲άϥϑͱͯ͠දݱͰ͖Δ
    • Artifact ಉ࢜ͷґଘؔ܎Λ༗޲άϥϑͰදݱ͢Δ
    • ϊʔυ: artifactɺΤοδ: ґଘؔ܎
    • Τοδ͸ґଘ͍ͯ͠Δ → ґଘ͞Ε͍ͯΔɺͷ޲͖
    • είΞ͕ߴ͍ artifact ΄ͲɺॏཁͰ͋ΔͱղऍͰ͖Δ

    View full-size slide

  11. Maven central repository ͷ
    σʔλΛऩू͢Δ

    View full-size slide

  12. Ͳ͏΍ͬͯऩू͢Δʁ
    • https://repo1.maven.org/maven2/ ͔Βɺ͢΂ͯͷ
    POM ϑΝΠϧΛͻͨ͢ΒΫϩʔϧ͢Δʁ
    • όʔδϣϯҧ͍ࠐΈͰ artifact ૯਺͸ 200 ສҎ্…
    • ࠷৽όʔδϣϯͷ POM ͚ͩμ΢ϯϩʔυ͍ͨ͠
    • ͔͠͠ɺͲͷ artifact ͕࠷৽όʔδϣϯͳͷ͔Λ

    (จࣈྻͷ) όʔδϣϯ৘ใ͔Β൑ఆ͢Δͷ͸໘౗

    View full-size slide

  13. Index ϑΝΠϧΛར༻͢Δ
    • ࣮͸ central repository ্ͷ͢΂ͯͷόʔδϣϯͷ
    artifact ΛؚΜͩ index ϑΝΠϧ͕ఏڙ͞Ε͍ͯΔ
    • https://maven.apache.org/repository/central-
    index.html
    • .properties ϑΝΠϧͱ gzip ѹॖ͞ΕͨϑΝΠϧ

    (300 MB ௒) ͷೋͭͰߏ੒͞Ε͍ͯΔ
    • Weekly Ͱߋ৽͞Ε͍ͯΔ

    View full-size slide

  14. Index ϑΝΠϧͰಘΒΕΔ / ಘΒΕͳ͍৘ใ
    • Index ϑΝΠϧ͔ΒಘΒΕΔ৘ใ (Ұ෦)
    • Group ID
    • Artifact ID
    • όʔδϣϯ
    • Classifier (sources / javadoc / linux-x86_64 ͱ͔ͷΞϨ)
    • Artifact ͷϑΝΠϧͷ࠷ऴߋ৽೔࣌
    • ͜ΕͰ࠷৽όʔδϣϯͷ artifact Λ೺ѲͰ͖Δ͸ͣ
    • Index ϑΝΠϧ͔Β͸ಘΒΕͳ͍৘ใ
    • Artifact ಉ࢜ͷґଘؔ܎

    View full-size slide

  15. Index ϑΝΠϧͷ૸ࠪ
    • indexer-reader Λར༻͢Δ
    • group: 'org.apache.maven.indexer'
    • name: 'indexer-reader'
    • ۩ମతͳར༻ํ๏͸ҎԼ URL ͷ࣮૷Λࢀর
    • http://bit.ly/maven-indexer-demo

    View full-size slide

  16. Artifact ಉ࢜ͷґଘؔ܎
    • Maven central repository ্ͷ POM ϑΝΠϧΛ

    ࢀর͢ΔҎ֎ʹख͕ͳ͍ͬΆ͍
    • ࢓ํͳ͍ͷͰɺͻͨ͢ΒྗٕͰΫϩʔϧ
    • ֤ Artifact ͷ࠷৽όʔδϣϯʹݶఆ͢Ε͹ɺ

    ଟগ͸ϚγʹͳΔ
    • ͦΕͰ΋ 20 ສҎ্͚ͩͲ…

    View full-size slide

  17. POM ϑΝΠϧͷಡΈࠐΈ
    • maven-model Λར༻͢Δ
    • group: 'org.apache.maven'
    • name: 'maven-model'
    public static void demo() throws Exception {
    try (InputStream in = new FileInputStream("path/to/pom.xml")) {
    Model model = new MavenXpp3Reader().read(in);
    // ґଘؔ܎͕औಘͰ͖Δ
    List dependencies = model.getDependencies();
    }
    }

    View full-size slide

  18. PageRank Λܭࢉ͢Δ

    View full-size slide

  19. ࣗલ࣮૷ʁ ൱ʂ

    View full-size slide

  20. Apache Spark / GraphX Λ࢖͏
    • GraphX
    • Spark ্ͰάϥϑΛѻ͍ɺܭࢉ͢ΔͨΊͷ
    API Λఏڙ͢Δ
    • PageRank ͕͠Εͬͱ࣮૷͞Ε͍ͯΔ ❤
    • άϥϑͷن໛తʹɺLocal mode ͰܭࢉՄೳ

    View full-size slide

  21. Apache Spark / GraphX Λ࢖͏
    def run(sc: SparkContext): Unit = {
    // ਺஋දݱ͞Εͨ 2 ͭͷ artifact Λεϖʔε۠੾ΓͰฒ΂ͯґଘؔ܎Λදͨ͠ϑΝΠϧ
    val graph = GraphLoader.edgeListFile(sc, "path/to/dependency-graph.txt")
    // PageRank Λܭࢉ͢Δ
    val ranking = graph.pageRank(0.0001).vertices
    // Artifact ͷ਺஋දݱ͔Β GAV (groupId|artifactId|version) ΁ͷϚοϐϯά
    val artifacts = sc.textFile("path/to/artifacts.txt").map { line =>
    val values = line.split(",")
    (values(0).toLong, values(1))
    }
    // Artifact ͷ਺஋දݱΛ GAV ʹஔ͖׵͑ͯϑΝΠϧʹॻ͖ग़͢
    artifacts.join(ranking).map { case (id, (gav, rank)) => (gav, rank) }
    .sortBy(_._2, ascending = false)
    .map(t => t._1 + "," + t._2)
    .saveAsTextFile("path/to/result")
    }

    View full-size slide

  22. ґଘؔ܎ͷάϥϑ
    • Maven ͷґଘؔ܎ʹ͸ʮείʔϓʯ͕͋Δ
    • compile, provided, runtime, test, system, import
    • ҎԼͷείʔϓ (ͷ૊Έ߹Θͤ) ͝ͱʹ PageRank Λܭࢉ͢Δ
    • ͢΂ͯ
    • compile
    • test
    • ͢΂ͯ (ґଘ͞Ε͍ͯΔ → ґଘ͍ͯ͠Δɺͷٯ޲͖)

    View full-size slide

  23. ࣮ࡍʹϥϯΩϯάΛݟͯΈΑ͏

    View full-size slide

  24. ϥϯΩϯά݁Ռʹ͍ͭͯ
    • Top 10 ΋͘͠͸ Top 20 ʹߜͬͯ͝঺հ
    • Top 100 ·Ͱͷ݁Ռ͸ҎԼʹܝࡌ

    (Google εϓϨουγʔτ)
    • http://bit.ly/PackageRank

    View full-size slide

  25. ϥϯΩϯά: ͢΂ͯ

    View full-size slide

  26. ϥϯΩϯά: ͢΂ͯ (#1~10)
    1BHF3BOL HSPVQ BSUJGBDU WFSTJPO
    KVOJU KVOJU
    PSHTDBMBMBOH TDBMBDPNQJMFS
    PSHTMGK TMGKBQJ BMQIB
    PSHNPDLJUP NPDLJUPDPSF
    PSHUFTUOH UFTUOH
    PSHTDBMBUFTU TDBMBUFTU@
    PSHNPDLJUP NPDLJUPBMM CFUB
    KBWBYTFSWMFU TFSWMFUBQJ BMQIB
    DIRPTMPHCBDL MPHCBDLDMBTTJD
    PSHPCKFOFTJT PCKFOFTJT
    http://bit.ly/PackageRank

    View full-size slide

  27. ϥϯΩϯά: ͢΂ͯ (#11~20)
    1BHF3BOL HSPVQ BSUJGBDU WFSTJPO
    KBWBYTFSWMFU KBWBYTFSWMFUBQJ
    PSHBTTFSUK BTTFSUKDPSF
    MPHK MPHK
    PSHPTHJ PSHPTHJDPSF
    PSHTMGK TMGKMPHK BMQIB
    PSHTDBMBMBOH TDBMBMJCSBSZ
    OFUCZUFCVEEZ CZUFCVEEZ
    PSHTDBMBUFTU TDBMBUFTU@
    OFUCZUFCVEEZ CZUFCVEEZBHFOU
    PSHTMGK TMGKTJNQMF BMQIB
    http://bit.ly/PackageRank

    View full-size slide

  28. ϥϯΩϯάτοϓͷ܏޲
    • ςετؔ࿈
    • junit, testng, scalatest, assertj, mockito
    • ݴޠ
    • Scala (scala-compiler, scala-library)
    • ϩά
    • slf4j, logback, log4j (log4j2 ͡Όͳ͍)
    • ͦͷଞ
    • objenesis, byte-buddy, servlet-api, org.osgi.core…

    View full-size slide

  29. ϥϯΩϯά: compile

    View full-size slide

  30. ϥϯΩϯά: compile
    1BHF3BOL HSPVQ BSUJGBDU WFSTJPO
    PSHTDBMBMBOH TDBMBMJCSBSZ
    PSHTMGK TMGKBQJ BMQIB
    KVOJU KVOJU
    DPNHPPHMFHVBWB HVBWB
    PSHBOUMS BOUMSSVOUJNF
    PSHBOUMS TUSJOHUFNQMBUF
    DPNHPPHMFDPEFHTPO HTPO
    PSHKFUCSBJOT BOOPUBUJPOT
    DPNHPPHMFDPEFpOECVHT KTS
    PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC
    http://bit.ly/PackageRank-compile

    View full-size slide

  31. ϥϯΩϯά: compile
    1BHF3BOL HSPVQ BSUJGBDU WFSTJPO
    PSHTDBMBMBOH TDBMBMJCSBSZ
    PSHTMGK TMGKBQJ BMQIB
    KVOJU KVOJU
    DPNHPPHMFHVBWB HVBWB
    PSHBOUMS BOUMSSVOUJNF
    PSHBOUMS TUSJOHUFNQMBUF
    DPNHPPHMFDPEFHTPO HTPO
    PSHKFUCSBJOT BOOPUBUJPOT
    DPNHPPHMFDPEFpOECVHT KTS
    PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC

    http://bit.ly/PackageRank-compile

    View full-size slide

  32. ϥϯΩϯά: test

    View full-size slide

  33. ϥϯΩϯά: test
    1BHF3BOL HSPVQ BSUJGBDU WFSTJPO
    KVOJU KVOJU
    PSHNPDLJUP NPDLJUPDPSF
    PSHTMGK TMGKBQJ BMQIB
    PSHUFTUOH UFTUOH
    PSHTDBMBUFTU TDBMBUFTU@
    PSHNPDLJUP NPDLJUPBMM CFUB
    DIRPTMPHCBDL MPHCBDLDMBTTJD
    PSHBTTFSUK BTTFSUKDPSF
    PSHTMGK TMGKMPHK BMQIB
    PSHTQPDLGSBNFXPSL TQPDLDPSF HSPPWZ
    http://bit.ly/PackageRank-test

    View full-size slide

  34. ϥϯΩϯά: ͢΂ͯ (ٯ޲͖)

    View full-size slide

  35. ϥϯΩϯά: ͢΂ͯ (ٯ޲͖)
    1BHF3BOL HSPVQ BSUJGBDU WFSTJPO
    PSHBQBDIFDMFSF[[B
    QMBUGPSNMBVODIFSTUPSBHFMFT
    TQBSFOU
    JODVCBUJOH
    PSHRJKMJCSBSZ PSHRJKMJCSBSZTIJSPXFC
    DPNHJUIVCMJWFTFOTF PSHMJWF4FOTFBTTFNCMJFT
    PSHBQBDIFQPMZHFOFMJCSBSJFT
    PSHBQBDIFQPMZHFOFMJCSBSZ
    TIJSPXFC

    DPNHJUIVCTOPXESFBNBOESPJE XJEHFU

    PSHCMVFTUFNTPGUXBSFPQFOFPBFYBN
    QMFBQQMJDBUJPOTQSJOH
    PSEFSNBOBHFSBQQMJDBUJPO

    PSHCMVFTUFNTPGUXBSFPQFOFPBFYBN
    QMFBQQMJDBUJPOTQSJOH
    XBSFIPVTFNBOBHFS
    BQQMJDBUJPO


    LSQFLXPOOBNTQZNFNDBDIFEFYUSB
    USBOTDPEFST
    TQZNFNDBDIFEFYUSB
    USBOTDPEFSTDPSF

    PSHBQBDIFTFSWJDFNJYCVOEMFT
    PSHBQBDIFTFSWJDFNJYCVOEM
    FTBXTKBWBTEL
    @
    NFUBUBSLBHTPOWBMVF HTPOWBMVF
    http://bit.ly/PackageRank-inverted

    View full-size slide

  36. ·ͱΊ
    • Artifact ͷґଘؔ܎Λ΋ͱʹ PageRank Λܭࢉ
    ͠ɺartifact ΛϥϯΩϯάͯ͠Έͨ
    • ·͋·͋ଥ౰ͳ݁Ռ…͔ͳʁ
    • ࠷ۙެ։͞Εͨɺྺ࢙ͷઙ͍ artifact ͚ͩʹߜͬ
    ͯ PageRank Λܭࢉͯ͠Έ͍ͨ
    • ࠷ۙͷτϨϯυతͳ artifact Λݟ͚ͭΔ͜ͱ͕
    Ͱ͖Δ͔΋

    View full-size slide