Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maven central repository の artifact をランキングする #渋谷java

Maven central repository の artifact をランキングする #渋谷java

第二十回 #渋谷java の発表資料です。Maven central repository 上の artifact を PageRank を使ってランキングしてみる、というお話です。
https://shibuya-java.connpass.com/event/65433/

E77287648aff5484ac7659748e45c936?s=128

KOMIYA Atsushi

September 30, 2017
Tweet

Transcript

  1. Maven central repository ͷ artifact ΛϥϯΩϯά͢Δ ौ୩Java #20 2017-09-30 KOMIYA

    Atsushi
  2. @komiya_atsushi

  3. None
  4. Today’s topic

  5. ΈΜͳ ͍͖ͩ͢ Maven central repository ͷ artifact ΛϥϯΩϯάͯ͠ΈΔ

  6. Artifact ΛϥϯΩϯά͢Δ • Maven central repository ্ͷ artifact ͸
 (όʔδϣϯҧ͍Λআ͍ͯ)

    20 ສҎ্ଘࡏ͢Δ • ΞϓϦέʔγϣϯʹ૊ΈࠐΉϥΠϒϥϦΛબఆ
 ͢Δࡍ͸ɺར༻࣮੷ͷ͋ΔϥΠϒϥϦΛબͼ͍ͨ • ௕͍΋ͷʹ͸ר͔Εͯੜ͖͍͖͍ͯͨੑ෼ • Artifact ʹର͢ΔϥϯΩϯά͕ཉ͍͠ʂ
  7. http://search.maven.org/#stats

  8. Ͳ͏΍ͬͯ artifact Λ ϥϯΩϯά͢Δͷ͔ʁ

  9. Artifact ͷґଘؔ܎ʹண໨͢Δ • ʮ͞·͟·ͳ artifact ʹґଘ͞Ε͍ͯΔ artifact ΄ Ͳɺॏཁͳ artifact

    Ͱ͋Δʯͱ͍͏ԾઆΛߟ͑Δ • Artifact ͝ͱͷʮඃࢀর਺ʯ Λࢦඪ஋ͱ͢Δํ๏͕ ߟ͑ΒΕΔ • ࿦จͷʮඃҾ༻਺ʯతͳߟ͑ํ • ୯७ͳʮඃࢀর਺ʯΑΓΑ͍ࢦඪ஋͸ͳ͍͔ʁ
  10. None
  11. ґଘؔ܎͸ ༗޲άϥϑͰ දݱͰ͖Δ

  12. PageRank !

  13. PageRank • Google ͷΞϨ • Web ϖʔδͷϦϯΫؔ܎͔Βϖʔδͷॏཁ౓Λଌఆ • ϦϯΫؔ܎͸༗޲άϥϑͱͯ͠දݱͰ͖Δ •

    Artifact ಉ࢜ͷґଘؔ܎Λ༗޲άϥϑͰදݱ͢Δ • ϊʔυ: artifactɺΤοδ: ґଘؔ܎ • Τοδ͸ґଘ͍ͯ͠Δ → ґଘ͞Ε͍ͯΔɺͷ޲͖ • είΞ͕ߴ͍ artifact ΄ͲɺॏཁͰ͋ΔͱղऍͰ͖Δ
  14. Maven central repository ͷ σʔλΛऩू͢Δ

  15. Ͳ͏΍ͬͯऩू͢Δʁ • https://repo1.maven.org/maven2/ ͔Βɺ͢΂ͯͷ POM ϑΝΠϧΛͻͨ͢ΒΫϩʔϧ͢Δʁ • όʔδϣϯҧ͍ࠐΈͰ artifact ૯਺͸

    200 ສҎ্… • ࠷৽όʔδϣϯͷ POM ͚ͩμ΢ϯϩʔυ͍ͨ͠ • ͔͠͠ɺͲͷ artifact ͕࠷৽όʔδϣϯͳͷ͔Λ
 (จࣈྻͷ) όʔδϣϯ৘ใ͔Β൑ఆ͢Δͷ͸໘౗
  16. Index ϑΝΠϧΛར༻͢Δ • ࣮͸ central repository ্ͷ͢΂ͯͷόʔδϣϯͷ artifact ΛؚΜͩ index

    ϑΝΠϧ͕ఏڙ͞Ε͍ͯΔ • https://maven.apache.org/repository/central- index.html • .properties ϑΝΠϧͱ gzip ѹॖ͞ΕͨϑΝΠϧ
 (300 MB ௒) ͷೋͭͰߏ੒͞Ε͍ͯΔ • Weekly Ͱߋ৽͞Ε͍ͯΔ
  17. Index ϑΝΠϧͰಘΒΕΔ / ಘΒΕͳ͍৘ใ • Index ϑΝΠϧ͔ΒಘΒΕΔ৘ใ (Ұ෦) • Group

    ID • Artifact ID • όʔδϣϯ • Classifier (sources / javadoc / linux-x86_64 ͱ͔ͷΞϨ) • Artifact ͷϑΝΠϧͷ࠷ऴߋ৽೔࣌ • ͜ΕͰ࠷৽όʔδϣϯͷ artifact Λ೺ѲͰ͖Δ͸ͣ • Index ϑΝΠϧ͔Β͸ಘΒΕͳ͍৘ใ • Artifact ಉ࢜ͷґଘؔ܎
  18. Index ϑΝΠϧͷ૸ࠪ • indexer-reader Λར༻͢Δ • group: 'org.apache.maven.indexer' • name:

    'indexer-reader' • ۩ମతͳར༻ํ๏͸ҎԼ URL ͷ࣮૷Λࢀর • http://bit.ly/maven-indexer-demo
  19. Artifact ಉ࢜ͷґଘؔ܎ • Maven central repository ্ͷ POM ϑΝΠϧΛ
 ࢀর͢ΔҎ֎ʹख͕ͳ͍ͬΆ͍

    • ࢓ํͳ͍ͷͰɺͻͨ͢ΒྗٕͰΫϩʔϧ • ֤ Artifact ͷ࠷৽όʔδϣϯʹݶఆ͢Ε͹ɺ
 ଟগ͸ϚγʹͳΔ • ͦΕͰ΋ 20 ສҎ্͚ͩͲ…
  20. POM ϑΝΠϧͷಡΈࠐΈ • maven-model Λར༻͢Δ • group: 'org.apache.maven' • name:

    'maven-model' public static void demo() throws Exception { try (InputStream in = new FileInputStream("path/to/pom.xml")) { Model model = new MavenXpp3Reader().read(in); // ґଘؔ܎͕औಘͰ͖Δ List<Dependency> dependencies = model.getDependencies(); } }
  21. PageRank Λܭࢉ͢Δ

  22. ࣗલ࣮૷ʁ ൱ʂ

  23. Apache Spark / GraphX Λ࢖͏ • GraphX • Spark ্ͰάϥϑΛѻ͍ɺܭࢉ͢ΔͨΊͷ

    API Λఏڙ͢Δ • PageRank ͕͠Εͬͱ࣮૷͞Ε͍ͯΔ ❤ • άϥϑͷن໛తʹɺLocal mode ͰܭࢉՄೳ
  24. Apache Spark / GraphX Λ࢖͏ def run(sc: SparkContext): Unit =

    { // ਺஋දݱ͞Εͨ 2 ͭͷ artifact Λεϖʔε۠੾ΓͰฒ΂ͯґଘؔ܎Λදͨ͠ϑΝΠϧ val graph = GraphLoader.edgeListFile(sc, "path/to/dependency-graph.txt") // PageRank Λܭࢉ͢Δ val ranking = graph.pageRank(0.0001).vertices // Artifact ͷ਺஋දݱ͔Β GAV (groupId|artifactId|version) ΁ͷϚοϐϯά val artifacts = sc.textFile("path/to/artifacts.txt").map { line => val values = line.split(",") (values(0).toLong, values(1)) } // Artifact ͷ਺஋දݱΛ GAV ʹஔ͖׵͑ͯϑΝΠϧʹॻ͖ग़͢ artifacts.join(ranking).map { case (id, (gav, rank)) => (gav, rank) } .sortBy(_._2, ascending = false) .map(t => t._1 + "," + t._2) .saveAsTextFile("path/to/result") }
  25. ґଘؔ܎ͷάϥϑ • Maven ͷґଘؔ܎ʹ͸ʮείʔϓʯ͕͋Δ • compile, provided, runtime, test, system,

    import • ҎԼͷείʔϓ (ͷ૊Έ߹Θͤ) ͝ͱʹ PageRank Λܭࢉ͢Δ • ͢΂ͯ • compile • test • ͢΂ͯ (ґଘ͞Ε͍ͯΔ → ґଘ͍ͯ͠Δɺͷٯ޲͖)
  26. ࣮ࡍʹϥϯΩϯάΛݟͯΈΑ͏

  27. ϥϯΩϯά݁Ռʹ͍ͭͯ • Top 10 ΋͘͠͸ Top 20 ʹߜͬͯ͝঺հ • Top

    100 ·Ͱͷ݁Ռ͸ҎԼʹܝࡌ
 (Google εϓϨουγʔτ) • http://bit.ly/PackageRank
  28. ϥϯΩϯά: ͢΂ͯ

  29. ϥϯΩϯά: ͢΂ͯ (#1~10) 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  KVOJU KVOJU

      PSHTDBMBMBOH TDBMBDPNQJMFS   PSHTMGK TMGKBQJ BMQIB  PSHNPDLJUP NPDLJUPDPSF   PSHUFTUOH UFTUOH   PSHTDBMBUFTU TDBMBUFTU@   PSHNPDLJUP NPDLJUPBMM CFUB  KBWBYTFSWMFU TFSWMFUBQJ BMQIB  DIRPTMPHCBDL MPHCBDLDMBTTJD   PSHPCKFOFTJT PCKFOFTJT  http://bit.ly/PackageRank
  30. ϥϯΩϯά: ͢΂ͯ (#11~20) 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  KBWBYTFSWMFU KBWBYTFSWMFUBQJ

      PSHBTTFSUK BTTFSUKDPSF   MPHK MPHK   PSHPTHJ PSHPTHJDPSF   PSHTMGK TMGKMPHK BMQIB  PSHTDBMBMBOH TDBMBMJCSBSZ   OFUCZUFCVEEZ CZUFCVEEZ   PSHTDBMBUFTU TDBMBUFTU@   OFUCZUFCVEEZ CZUFCVEEZBHFOU   PSHTMGK TMGKTJNQMF BMQIB http://bit.ly/PackageRank
  31. ϥϯΩϯάτοϓͷ܏޲ • ςετؔ࿈ • junit, testng, scalatest, assertj, mockito •

    ݴޠ • Scala (scala-compiler, scala-library) • ϩά • slf4j, logback, log4j (log4j2 ͡Όͳ͍) • ͦͷଞ • objenesis, byte-buddy, servlet-api, org.osgi.core…
  32. ϥϯΩϯά: compile

  33. ϥϯΩϯά: compile 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  PSHTDBMBMBOH TDBMBMJCSBSZ 

     PSHTMGK TMGKBQJ BMQIB  KVOJU KVOJU   DPNHPPHMFHVBWB HVBWB   PSHBOUMS BOUMSSVOUJNF   PSHBOUMS TUSJOHUFNQMBUF   DPNHPPHMFDPEFHTPO HTPO   PSHKFUCSBJOT BOOPUBUJPOT   DPNHPPHMFDPEFpOECVHT KTS   PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC  http://bit.ly/PackageRank-compile
  34. ϥϯΩϯά: compile 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  PSHTDBMBMBOH TDBMBMJCSBSZ 

     PSHTMGK TMGKBQJ BMQIB  KVOJU KVOJU   DPNHPPHMFHVBWB HVBWB   PSHBOUMS BOUMSSVOUJNF   PSHBOUMS TUSJOHUFNQMBUF   DPNHPPHMFDPEFHTPO HTPO   PSHKFUCSBJOT BOOPUBUJPOT   DPNHPPHMFDPEFpOECVHT KTS   PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC  ❗ http://bit.ly/PackageRank-compile
  35. ϥϯΩϯά: test

  36. ϥϯΩϯά: test 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  KVOJU KVOJU 

     PSHNPDLJUP NPDLJUPDPSF   PSHTMGK TMGKBQJ BMQIB  PSHUFTUOH UFTUOH   PSHTDBMBUFTU TDBMBUFTU@   PSHNPDLJUP NPDLJUPBMM CFUB  DIRPTMPHCBDL MPHCBDLDMBTTJD   PSHBTTFSUK BTTFSUKDPSF   PSHTMGK TMGKMPHK BMQIB  PSHTQPDLGSBNFXPSL TQPDLDPSF HSPPWZ http://bit.ly/PackageRank-test
  37. ϥϯΩϯά: ͢΂ͯ (ٯ޲͖)

  38. ϥϯΩϯά: ͢΂ͯ (ٯ޲͖) 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  PSHBQBDIFDMFSF[[B QMBUGPSNMBVODIFSTUPSBHFMFT

    TQBSFOU JODVCBUJOH  PSHRJKMJCSBSZ PSHRJKMJCSBSZTIJSPXFC   DPNHJUIVCMJWFTFOTF PSHMJWF4FOTFBTTFNCMJFT   PSHBQBDIFQPMZHFOFMJCSBSJFT PSHBQBDIFQPMZHFOFMJCSBSZ TIJSPXFC   DPNHJUIVCTOPXESFBNBOESPJE XJEHFU   PSHCMVFTUFNTPGUXBSFPQFOFPBFYBN QMFBQQMJDBUJPOTQSJOH PSEFSNBOBHFSBQQMJDBUJPO   PSHCMVFTUFNTPGUXBSFPQFOFPBFYBN QMFBQQMJDBUJPOTQSJOH XBSFIPVTFNBOBHFS BQQMJDBUJPO   LSQFLXPOOBNTQZNFNDBDIFEFYUSB USBOTDPEFST TQZNFNDBDIFEFYUSB USBOTDPEFSTDPSF   PSHBQBDIFTFSWJDFNJYCVOEMFT PSHBQBDIFTFSWJDFNJYCVOEM FTBXTKBWBTEL @  NFUBUBSLBHTPOWBMVF HTPOWBMVF  http://bit.ly/PackageRank-inverted
  39. ·ͱΊ

  40. ·ͱΊ • Artifact ͷґଘؔ܎Λ΋ͱʹ PageRank Λܭࢉ ͠ɺartifact ΛϥϯΩϯάͯ͠Έͨ • ·͋·͋ଥ౰ͳ݁Ռ…͔ͳʁ

    • ࠷ۙެ։͞Εͨɺྺ࢙ͷઙ͍ artifact ͚ͩʹߜͬ ͯ PageRank Λܭࢉͯ͠Έ͍ͨ • ࠷ۙͷτϨϯυతͳ artifact Λݟ͚ͭΔ͜ͱ͕ Ͱ͖Δ͔΋
  41. Thank you!