Why a new CPAN client cpm is fast

Slide 1

Slide 1 text

Why a new CPAN client cpm is fast Shoichi Kaji

Slide 2

Slide 2 text

Me • Shoichi Kaji • Tokyo, Japan • pause/github: skaji • Perl5: cpm, App::FatPacker::Simple, Mojo::SlackRTM • Perl6: mi6, Frinfon, evalbot in Slack:)

Slide 3

Slide 3 text

Agenda • What is cpm, and why? • cpanm VS cpm • The internal of cpm • divide installing processes into pieaces • learn from go language • Roadmap

Slide 4

Slide 4 text

Q: What is cpm?

Slide 5

Slide 5 text

A: It’s yet another CPAN client

Slide 6

Slide 6 text

Why a new CPAN client? • Yes, I always use cpanm to install CPAN modules. It’s awesome! • Because cpanm installs modules in series,  it takes quite a lot of time to install a module that has many dependencies

Slide 7

Slide 7 text

I want to install CPAN modules as fast as possible

Slide 8

Slide 8 text

Why a new CPAN client? • So I created cpm • Actually cpm is not a new CPAN client,  but it uses cpanm in parallel,  so that it can install CPAN modules much faster

Slide 9

Slide 9 text

How fast? cpanm VS cpm installing Plack

Slide 10

Slide 10 text

cpanm

Slide 11

Slide 11 text

cpm

Slide 12

Slide 12 text

cpanm: 30sec cpm: 10sec cpm is 3x faster than cpanm!

Slide 13

Slide 13 text

Why cpm is so fast? — The internal of cpm —

Slide 14

Slide 14 text

First, let’s think simple $ cat modules | xargs cpanm Can we just use xargs to parallelize cpanm? NO, WE CAN’T.

Slide 15

Slide 15 text

The problem with • The modules to be installed are not determined in advance. • Even if you have a list of modules to be installed, cpanm workers will be broken unless you synchronize cpanm workers • So we have to • (1) divide installing process of CPAN module into pieces that can be executed individually • (2) synchronize cpanm workers in some way $ cat modules | xargs cpanm

Slide 16

Slide 16 text

(1) Divide installing process of CPAN modules sub installing_process { my $module = shift; # 1. resolve # query cpanmetadb my $dist_url = resolve($module); # 2. fetch (and extract) # wget && tar xzf && read META.json my ($dir, @configure_deps) = fetch($dist_url); install_module($_) for @configure_deps; # 3. configure # perl Makefile.PL/Build.PL && read MYMETA.json my @deps = configure($dir); install_module($_) for @deps; # 4. install # make install (or ./Build install) install($dir); } I divided the process into 4 jobs: * resolve * fetch * conﬁgure * install which are independent

Slide 17

Slide 17 text

(2) synchronize cpanm workers

Slide 18

Slide 18 text

Take a look at go language… go introduces two concurrency primitives: * goroutines * channels They are very simple but powerful. func work(in <-chan string, out chan<- string) { for { job := <-in // do work with job out <- "result" } } func main() { in := make(chan string) out := make(chan string) go work(in, out) in <- "job" result := <-out }

Slide 19

Slide 19 text

Take a look at go language… func main() { in1 := make(chan string) out1 := make(chan string) go work(in1, out1) in2 := make(chan string) out2 := make(chan string) go work(in2, out2) in1 <- "job1" in2 <- "job2" select { case result1 := <-out1: // do something with result1 case result2 := <-out2: // do something with result2 } } It is very easy to increase workers You can use select() to await multiple channels simultaneously

Slide 20

Slide 20 text

Can we adopt this idea to Perl5?

Slide 21

Slide 21 text

Of cource, we can.

Slide 22

Slide 22 text

go <-> Perl5 go Perl5 goroutine fork(2) channel pipe(2) select select(2)

Slide 23

Slide 23 text

The internal of cpm .BTUFS DQOBN XPSLFS DQOBN XPSLFS DQOBN XPSLFS TFMFDU QJQFY QJQFY QJQFY cpanm worker 1. get job via pipe 2. work, work, work! 3. send result via pipe Master 1. prepare pipes for workers by pipe(2) 2. launch workers by fork(2) and connect them with pipes 3. loop {  calculate jobs and send jobs to idle workers. if all workers are busy, then wait them and recieve results by select(2)  }

Slide 24

Slide 24 text

Roadmap • Last year I talked with Tatsuhiko Miyagawa about cpanm 2.0 (menlo) • Then he said “why don’t you merge cpm into cpanm itself?” • I was very happy to hear that!

Slide 25

Slide 25 text

Roadmap • So if you all ﬁnd cpm is useful and stable, then cpm should be merged into cpanm 2.0 • Before merging, there are some problems that need to be resolved: • The log ﬁle is very messy • I will highly appreciate your feedback!

Slide 26

Slide 26 text

try cpm now $ cpanm -nq App::cpm thanks!