第二十回 #渋谷java の発表資料です。Maven central repository 上の artifact を PageRank を使ってランキングしてみる、というお話です。 https://shibuya-java.connpass.com/event/65433/
Maven central repository ͷartifact ΛϥϯΩϯά͢Δौ୩Java #20 2017-09-30KOMIYA Atsushi
View Slide
@komiya_atsushi
Today’s topic
ΈΜͳ ͍͖ͩ͢ Maven central repositoryͷ artifact ΛϥϯΩϯάͯ͠ΈΔ
Artifact ΛϥϯΩϯά͢Δ• Maven central repository ্ͷ artifact (όʔδϣϯҧ͍Λআ͍ͯ) 20 ສҎ্ଘࡏ͢Δ • ΞϓϦέʔγϣϯʹΈࠐΉϥΠϒϥϦΛબఆ ͢Δࡍɺར༻࣮ͷ͋ΔϥΠϒϥϦΛબͼ͍ͨ• ͍ͷʹר͔Εͯੜ͖͍͖͍ͯͨੑ • Artifact ʹର͢ΔϥϯΩϯά͕ཉ͍͠ʂ
http://search.maven.org/#stats
Ͳ͏ͬͯ artifact ΛϥϯΩϯά͢Δͷ͔ʁ
Artifact ͷґଘؔʹண͢Δ• ʮ͞·͟·ͳ artifact ʹґଘ͞Ε͍ͯΔ artifact ΄Ͳɺॏཁͳ artifact Ͱ͋Δʯͱ͍͏ԾઆΛߟ͑Δ• Artifact ͝ͱͷʮඃࢀরʯ Λࢦඪͱ͢Δํ๏͕ߟ͑ΒΕΔ• จͷʮඃҾ༻ʯతͳߟ͑ํ• ୯७ͳʮඃࢀরʯΑΓΑ͍ࢦඪͳ͍͔ʁ
ґଘؔ༗άϥϑͰදݱͰ͖Δ
PageRank !
PageRank• Google ͷΞϨ• Web ϖʔδͷϦϯΫ͔ؔΒϖʔδͷॏཁΛଌఆ• ϦϯΫؔ༗άϥϑͱͯ͠දݱͰ͖Δ• Artifact ಉ࢜ͷґଘؔΛ༗άϥϑͰදݱ͢Δ• ϊʔυ: artifactɺΤοδ: ґଘؔ• Τοδґଘ͍ͯ͠Δ → ґଘ͞Ε͍ͯΔɺͷ͖• είΞ͕ߴ͍ artifact ΄ͲɺॏཁͰ͋ΔͱղऍͰ͖Δ
Maven central repository ͷσʔλΛऩू͢Δ
Ͳ͏ͬͯऩू͢Δʁ• https://repo1.maven.org/maven2/ ͔Βɺͯ͢ͷPOM ϑΝΠϧΛͻͨ͢ΒΫϩʔϧ͢Δʁ• όʔδϣϯҧ͍ࠐΈͰ artifact ૯ 200 ສҎ্… • ࠷৽όʔδϣϯͷ POM ͚ͩμϯϩʔυ͍ͨ͠• ͔͠͠ɺͲͷ artifact ͕࠷৽όʔδϣϯͳͷ͔Λ (จࣈྻͷ) όʔδϣϯใ͔Βఆ͢Δͷ໘
Index ϑΝΠϧΛར༻͢Δ• ࣮ central repository ্ͷͯ͢ͷόʔδϣϯͷartifact ΛؚΜͩ index ϑΝΠϧ͕ఏڙ͞Ε͍ͯΔ • https://maven.apache.org/repository/central-index.html• .properties ϑΝΠϧͱ gzip ѹॖ͞ΕͨϑΝΠϧ (300 MB ) ͷೋͭͰߏ͞Ε͍ͯΔ• Weekly Ͱߋ৽͞Ε͍ͯΔ
Index ϑΝΠϧͰಘΒΕΔ / ಘΒΕͳ͍ใ• Index ϑΝΠϧ͔ΒಘΒΕΔใ (Ұ෦)• Group ID• Artifact ID• όʔδϣϯ• Classifier (sources / javadoc / linux-x86_64 ͱ͔ͷΞϨ)• Artifact ͷϑΝΠϧͷ࠷ऴߋ৽࣌• ͜ΕͰ࠷৽όʔδϣϯͷ artifact ΛѲͰ͖Δͣ• Index ϑΝΠϧ͔ΒಘΒΕͳ͍ใ• Artifact ಉ࢜ͷґଘؔ
Index ϑΝΠϧͷࠪ• indexer-reader Λར༻͢Δ• group: 'org.apache.maven.indexer'• name: 'indexer-reader'• ۩ମతͳར༻ํ๏ҎԼ URL ͷ࣮Λࢀর• http://bit.ly/maven-indexer-demo
Artifact ಉ࢜ͷґଘؔ• Maven central repository ্ͷ POM ϑΝΠϧΛ ࢀর͢ΔҎ֎ʹख͕ͳ͍ͬΆ͍• ํͳ͍ͷͰɺͻͨ͢ΒྗٕͰΫϩʔϧ • ֤ Artifact ͷ࠷৽όʔδϣϯʹݶఆ͢Εɺ ଟগϚγʹͳΔ• ͦΕͰ 20 ສҎ্͚ͩͲ…
POM ϑΝΠϧͷಡΈࠐΈ• maven-model Λར༻͢Δ• group: 'org.apache.maven'• name: 'maven-model'public static void demo() throws Exception {try (InputStream in = new FileInputStream("path/to/pom.xml")) {Model model = new MavenXpp3Reader().read(in);// ґଘ͕ؔऔಘͰ͖ΔList dependencies = model.getDependencies();}}
PageRank Λܭࢉ͢Δ
ࣗલ࣮ʁ ൱ʂ
Apache Spark / GraphX Λ͏• GraphX• Spark ্ͰάϥϑΛѻ͍ɺܭࢉ͢ΔͨΊͷAPI Λఏڙ͢Δ• PageRank ͕͠Εͬͱ࣮͞Ε͍ͯΔ ❤• άϥϑͷنతʹɺLocal mode ͰܭࢉՄೳ
Apache Spark / GraphX Λ͏def run(sc: SparkContext): Unit = {// දݱ͞Εͨ 2 ͭͷ artifact Λεϖʔε۠ΓͰฒͯґଘؔΛදͨ͠ϑΝΠϧval graph = GraphLoader.edgeListFile(sc, "path/to/dependency-graph.txt")// PageRank Λܭࢉ͢Δval ranking = graph.pageRank(0.0001).vertices// Artifact ͷදݱ͔Β GAV (groupId|artifactId|version) ͷϚοϐϯάval artifacts = sc.textFile("path/to/artifacts.txt").map { line =>val values = line.split(",")(values(0).toLong, values(1))}// Artifact ͷදݱΛ GAV ʹஔ͖͑ͯϑΝΠϧʹॻ͖ग़͢artifacts.join(ranking).map { case (id, (gav, rank)) => (gav, rank) }.sortBy(_._2, ascending = false).map(t => t._1 + "," + t._2).saveAsTextFile("path/to/result")}
ґଘؔͷάϥϑ• Maven ͷґଘؔʹʮείʔϓʯ͕͋Δ• compile, provided, runtime, test, system, import• ҎԼͷείʔϓ (ͷΈ߹Θͤ) ͝ͱʹ PageRank Λܭࢉ͢Δ• ͯ͢• compile• test• ͯ͢ (ґଘ͞Ε͍ͯΔ → ґଘ͍ͯ͠Δɺͷٯ͖)
࣮ࡍʹϥϯΩϯάΛݟͯΈΑ͏
ϥϯΩϯά݁Ռʹ͍ͭͯ• Top 10 ͘͠ Top 20 ʹߜͬͯ͝հ• Top 100 ·Ͱͷ݁ՌҎԼʹܝࡌ (Google εϓϨουγʔτ)• http://bit.ly/PackageRank
ϥϯΩϯά: ͯ͢
ϥϯΩϯά: ͯ͢ (#1~10)1BHF3BOL HSPVQ BSUJGBDU WFSTJPO KVOJU KVOJU PSHTDBMBMBOH TDBMBDPNQJMFS PSHTMGK TMGKBQJ BMQIB PSHNPDLJUP NPDLJUPDPSF PSHUFTUOH UFTUOH PSHTDBMBUFTU [email protected] PSHNPDLJUP NPDLJUPBMM CFUB KBWBYTFSWMFU TFSWMFUBQJ BMQIB DIRPTMPHCBDL MPHCBDLDMBTTJD PSHPCKFOFTJT PCKFOFTJT http://bit.ly/PackageRank
ϥϯΩϯά: ͯ͢ (#11~20)1BHF3BOL HSPVQ BSUJGBDU WFSTJPO KBWBYTFSWMFU KBWBYTFSWMFUBQJ PSHBTTFSUK BTTFSUKDPSF MPHK MPHK PSHPTHJ PSHPTHJDPSF PSHTMGK TMGKMPHK BMQIB PSHTDBMBMBOH TDBMBMJCSBSZ OFUCZUFCVEEZ CZUFCVEEZ PSHTDBMBUFTU [email protected] OFUCZUFCVEEZ CZUFCVEEZBHFOU PSHTMGK TMGKTJNQMF BMQIBhttp://bit.ly/PackageRank
ϥϯΩϯάτοϓͷ• ςετؔ࿈• junit, testng, scalatest, assertj, mockito• ݴޠ• Scala (scala-compiler, scala-library)• ϩά• slf4j, logback, log4j (log4j2 ͡Όͳ͍)• ͦͷଞ• objenesis, byte-buddy, servlet-api, org.osgi.core…
ϥϯΩϯά: compile
ϥϯΩϯά: compile1BHF3BOL HSPVQ BSUJGBDU WFSTJPO PSHTDBMBMBOH TDBMBMJCSBSZ PSHTMGK TMGKBQJ BMQIB KVOJU KVOJU DPNHPPHMFHVBWB HVBWB PSHBOUMS BOUMSSVOUJNF PSHBOUMS TUSJOHUFNQMBUF DPNHPPHMFDPEFHTPO HTPO PSHKFUCSBJOT BOOPUBUJPOT DPNHPPHMFDPEFpOECVHT KTS PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC http://bit.ly/PackageRank-compile
ϥϯΩϯά: compile1BHF3BOL HSPVQ BSUJGBDU WFSTJPO PSHTDBMBMBOH TDBMBMJCSBSZ PSHTMGK TMGKBQJ BMQIB KVOJU KVOJU DPNHPPHMFHVBWB HVBWB PSHBOUMS BOUMSSVOUJNF PSHBOUMS TUSJOHUFNQMBUF DPNHPPHMFDPEFHTPO HTPO PSHKFUCSBJOT BOOPUBUJPOT DPNHPPHMFDPEFpOECVHT KTS PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC ❗http://bit.ly/PackageRank-compile
ϥϯΩϯά: test
ϥϯΩϯά: test1BHF3BOL HSPVQ BSUJGBDU WFSTJPO KVOJU KVOJU PSHNPDLJUP NPDLJUPDPSF PSHTMGK TMGKBQJ BMQIB PSHUFTUOH UFTUOH PSHTDBMBUFTU [email protected] PSHNPDLJUP NPDLJUPBMM CFUB DIRPTMPHCBDL MPHCBDLDMBTTJD PSHBTTFSUK BTTFSUKDPSF PSHTMGK TMGKMPHK BMQIB PSHTQPDLGSBNFXPSL TQPDLDPSF HSPPWZhttp://bit.ly/PackageRank-test
ϥϯΩϯά: ͯ͢ (ٯ͖)
ϥϯΩϯά: ͯ͢ (ٯ͖)1BHF3BOL HSPVQ BSUJGBDU WFSTJPO PSHBQBDIFDMFSF[[BQMBUGPSNMBVODIFSTUPSBHFMFTTQBSFOUJODVCBUJOH PSHRJKMJCSBSZ PSHRJKMJCSBSZTIJSPXFC DPNHJUIVCMJWFTFOTF PSHMJWF4FOTFBTTFNCMJFT PSHBQBDIFQPMZHFOFMJCSBSJFTPSHBQBDIFQPMZHFOFMJCSBSZTIJSPXFC DPNHJUIVCTOPXESFBNBOESPJE XJEHFU PSHCMVFTUFNTPGUXBSFPQFOFPBFYBNQMFBQQMJDBUJPOTQSJOHPSEFSNBOBHFSBQQMJDBUJPO PSHCMVFTUFNTPGUXBSFPQFOFPBFYBNQMFBQQMJDBUJPOTQSJOHXBSFIPVTFNBOBHFSBQQMJDBUJPOLSQFLXPOOBNTQZNFNDBDIFEFYUSBUSBOTDPEFSTTQZNFNDBDIFEFYUSBUSBOTDPEFSTDPSF PSHBQBDIFTFSWJDFNJYCVOEMFTPSHBQBDIFTFSWJDFNJYCVOEMFTBXTKBWBTEL@ NFUBUBSLBHTPOWBMVF HTPOWBMVF http://bit.ly/PackageRank-inverted
·ͱΊ
·ͱΊ• Artifact ͷґଘؔΛͱʹ PageRank Λܭࢉ͠ɺartifact ΛϥϯΩϯάͯ͠Έͨ• ·͋·͋ଥͳ݁Ռ…͔ͳʁ• ࠷ۙެ։͞Εͨɺྺ࢙ͷઙ͍ artifact ͚ͩʹߜͬͯ PageRank Λܭࢉͯ͠Έ͍ͨ• ࠷ۙͷτϨϯυతͳ artifact Λݟ͚ͭΔ͜ͱ͕Ͱ͖Δ͔
Thank you!