Save 37% off PRO during our Black Friday Sale! »

pmux

 pmux

Pmux is a lightweight file-based MapReduce system, written in Ruby.

254c7a2e2055fe9300ea3b419c4b1cfe?s=128

maebashi

June 02, 2013
Tweet

Transcript

  1. pmux maebashi  @  IIJ Copyright  (c)  2013  Internet  Ini=a=ve  Japan

     Inc. 
  2. Today's  Talk Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

    Hadoop
  3. What  is  MapReduce? Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.

     Represent  problems  as  Map  and  Reduce  step   (1)  Map  –  extract,  convert   (2)  Reduce  –  aggregate,  summarize
  4. What  is  GlusterFS? Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.

    
  5. locate  files  based  solely  on  their  name Copyright  (c)  2013

     Internet  Ini=a=ve  Japan  Inc.  (in case of distributed volume)
  6. What  is  pmux?  (1) •  stands  for  pipeline  mul)plexer  

    •  hQps://github.com/iij/pmux   •  hQps://github.com/iij/pmux/wiki   Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 
  7. What  is  pmux?  (2) •  file-­‐based  map/reduce  tool   • 

    uses  Unix  standard  input/output  as  the   interface   Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.  $ pmux --mapper="grep PATTERN" *.log Example:  distributed  grep files  on  GlusterFS
  8. Install Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.  run

     pmux  command   on  this  host
  9. Execu=on  Overview Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

    (1)  MapReduce  without  reduce  phase  
  10. 1.  lookup  target  files Copyright  (c)  2013  Internet  Ini=a=ve  Japan

     Inc.  run  pmux  command   on  this  host read  USVTUFEHMVTUFSGTQBUIJOGP  from  xaQr
  11. 2.  invoke  pmux  on  each  node Copyright  (c)  2013  Internet

     Ini=a=ve  Japan  Inc.  worker dispatcher
  12. 3.  assign  map  tasks  to  nodes Copyright  (c)  2013  Internet

     Ini=a=ve  Japan  Inc.  tasks  are  assigned  to  nodes(workers)  dynamically
  13. 4.  send  results  back  to  dispatcher Copyright  (c)  2013  Internet

     Ini=a=ve  Japan  Inc. 
  14. Execu=on  Overview Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

    (2)  with  reduce  phase  
  15. 3.  assign  map  tasks  to  nodes Copyright  (c)  2013  Internet

     Ini=a=ve  Japan  Inc. 
  16. 4.  mapper  produces  tmp  files Copyright  (c)  2013  Internet  Ini=a=ve

     Japan  Inc.  maper  produces  temporary  files  containing  intermediate  results
  17. 5.  shuffle Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

  18. 6.  assign  reduce  tasks  to  nodes Copyright  (c)  2013  Internet

     Ini=a=ve  Japan  Inc. 
  19. 7.  send  results  back  to  dispatcher Copyright  (c)  2013  Internet

     Ini=a=ve  Japan  Inc. 
  20. example(1):  count  of  status  code Copyright  (c)  2013  Internet  Ini=a=ve

     Japan  Inc.  extract  the  status  code  from  Apache  log  files  and  count $ pmux --mapper='grep PAT |cut -d" " -f 9’ \ --reducer='sort|uniq -c’ /mnt/glusterfs/*.log 176331 200 106360 206 809 400 21852 403 533 404 27 406 805 416 25 500
  21. example(2):  word  count Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.

     $ pmux --mapper=map.rb --reducer=reduce.rb \ --file=map.rb –-file=reduce.rb \ /mnt/glusterfs/*.txt #! /usr/bin/ruby -an $F.each {|f| print "#{f}\t1\n"} #! /usr/bin/ruby -an BEGIN {$c = Hash.new 0} $c[$F[0]] += $F[1].to_i END {$c.each {|k, v| print "#{k} #{v}\n"}} map.rb reduce.rb command  line
  22. Performance Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.  14:00:00.416011

     IP  21.44.60.29.hQp  >  170.73.162.175.58546:    .  3523999974:3524001422(1448)  ack  3401170238  win  1716    <nop,nop,=mestamp  1070614671  1955062367>   packet  capture  logs  (made  by  tcpdump) extract  the  most  frequently  appeared  IP  address   on  each  file 8344  files,  500K  lines/file,  total  4  billion  lines
  23. map  command Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

    --mapper='egrep –o "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+"| sort|uniq -c|sort -nr|head -1'
  24. result Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.  8

     hr  49  min  6  sec 1  node,  without  pmux
  25. result Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.  8

     hr  49  min  6  sec 1  min  45  sec 300  Hmes  fater 1  node,  without  pmux 60  nodes   (each  node  has  8  cores)
  26. related  tools •  pmux-­‐gw  (pmux-­‐gateway)   – HTTP  interface  for  pmux

      •  pmux-­‐logview   – visualizer  for  pmux  job  progress Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 
  27. pmux  gateway Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

  28. pmux-­‐logview Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

  29. Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

  30. Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc. 

  31. Copyright  (c)  2013  Internet  Ini=a=ve  Japan  Inc.