Why a new CPAN client cpm is fast

Why a new CPAN client cpm is fast Shoichi Kaji

Me • Shoichi Kaji • Tokyo, Japan • pause/github: skaji
• Perl5: cpm, App::FatPacker::Simple, Mojo::SlackRTM • Perl6: mi6, Frinfon, evalbot in Slack:)

Agenda • What is cpm, and why? • cpanm VS
cpm • The internal of cpm • divide installing processes into pieaces • learn from go language • Roadmap

Q: What is cpm?

A: It’s yet another CPAN client

Why a new CPAN client? • Yes, I always use
cpanm to install CPAN modules. It’s awesome! • Because cpanm installs modules in series,  it takes quite a lot of time to install a module that has many dependencies

I want to install CPAN modules as fast as possible

Why a new CPAN client? • So I created cpm
• Actually cpm is not a new CPAN client,  but it uses cpanm in parallel,  so that it can install CPAN modules much faster

How fast? cpanm VS cpm installing Plack

cpanm: 30sec cpm: 10sec cpm is 3x faster than cpanm!

Why cpm is so fast? — The internal of cpm
—

First, let’s think simple $ cat modules | xargs cpanm
Can we just use xargs to parallelize cpanm? NO, WE CAN’T.

The problem with • The modules to be installed are
not determined in advance. • Even if you have a list of modules to be installed, cpanm workers will be broken unless you synchronize cpanm workers • So we have to • (1) divide installing process of CPAN module into pieces that can be executed individually • (2) synchronize cpanm workers in some way $ cat modules | xargs cpanm

(1) Divide installing process of CPAN modules sub installing_process {
my $module = shift; # 1. resolve # query cpanmetadb my $dist_url = resolve($module); # 2. fetch (and extract) # wget && tar xzf && read META.json my ($dir, @configure_deps) = fetch($dist_url); install_module($_) for @configure_deps; # 3. configure # perl Makefile.PL/Build.PL && read MYMETA.json my @deps = configure($dir); install_module($_) for @deps; # 4. install # make install (or ./Build install) install($dir); } I divided the process into 4 jobs: * resolve * fetch * conﬁgure * install which are independent

(2) synchronize cpanm workers

Take a look at go language… go introduces two concurrency
primitives: * goroutines * channels They are very simple but powerful. func work(in <-chan string, out chan<- string) { for { job := <-in // do work with job out <- "result" } } func main() { in := make(chan string) out := make(chan string) go work(in, out) in <- "job" result := <-out }

Take a look at go language… func main() { in1
:= make(chan string) out1 := make(chan string) go work(in1, out1) in2 := make(chan string) out2 := make(chan string) go work(in2, out2) in1 <- "job1" in2 <- "job2" select { case result1 := <-out1: // do something with result1 case result2 := <-out2: // do something with result2 } } It is very easy to increase workers You can use select() to await multiple channels simultaneously

Can we adopt this idea to Perl5?

Of cource, we can.

go <-> Perl5 go Perl5 goroutine fork(2) channel pipe(2) select
select(2)

The internal of cpm .BTUFS DQOBN XPSLFS DQOBN XPSLFS DQOBN
XPSLFS TFMFDU QJQFY QJQFY QJQFY cpanm worker 1. get job via pipe 2. work, work, work! 3. send result via pipe Master 1. prepare pipes for workers by pipe(2) 2. launch workers by fork(2) and connect them with pipes 3. loop {  calculate jobs and send jobs to idle workers. if all workers are busy, then wait them and recieve results by select(2)  }

Roadmap • Last year I talked with Tatsuhiko Miyagawa about
cpanm 2.0 (menlo) • Then he said “why don’t you merge cpm into cpanm itself?” • I was very happy to hear that!

Roadmap • So if you all ﬁnd cpm is useful
and stable, then cpm should be merged into cpanm 2.0 • Before merging, there are some problems that need to be resolved: • The log ﬁle is very messy • I will highly appreciate your feedback!

try cpm now $ cpanm -nq App::cpm thanks!

Why a new CPAN client cpm is fast

Why a new CPAN client cpm is fast

Shoichi Kaji

More Decks by Shoichi Kaji

Featured

Transcript