Upgrade to Pro — share decks privately, control downloads, hide ads and more …

glusterfs-pmux-en

maebashi
November 13, 2013

 glusterfs-pmux-en

maebashi

November 13, 2013
Tweet

More Decks by maebashi

Other Decks in Technology

Transcript

  1. ©ɹ2013 Internet Initiative Japan Inc. ©ɹ2013 Internet Initiative Japan Inc.

    Pmux: a lightweight MapReduce framework Using GlusterFS [email protected] https://speakerdeck.com/maebashi/glusterfs-pmux-en
  2. ©ɹ2013 Internet Initiative Japan Inc. About me •  Takahiro Maebashi

    •  Internet Initiative Japan Inc. (IIJ) •  ITpro: ITݕূϥϘ -- ෼ࢄϑΝΠϧγεςϜͷ GlusterFSɿ͜Μͳͱ͖ɺͲ͏ͳΔ – http://itpro.nikkeibp.co.jp/ article/COLUMN/20130104/447701/!
  3. ©ɹ2013 Internet Initiative Japan Inc. GlusterFS servers in IZmo • 

    The racks within the module are set at an angle
  4. ©ɹ2013 Internet Initiative Japan Inc. What is MapReduce? Represent problems

    as Map and Reduce step (1) Map – extract, convert (2) Reduce – aggregate, summarize
  5. ©ɹ2013 Internet Initiative Japan Inc. What is GlusterFS? (2) (in

    case of distributed volume) locate files based solely on their name
  6. ©ɹ2013 Internet Initiative Japan Inc. What is pmux? (1) • 

    stands for pipeline multiplexer •  written in Ruby • https://github.com/iij/pmux! • https://forge.gluster.org/pmux!
  7. ©ɹ2013 Internet Initiative Japan Inc. What is pmux? (2) • 

    file-based map/reduce tool •  uses Unix standard input/output as the interface $ pmux --mapper="grep PATTERN" *.log Example: distributed grep files on GlusterFS
  8. ©ɹ2013 Internet Initiative Japan Inc. Install $ gem install pmux

    $ gem install pmux $ gem install gflocator $ sudo gflocator
  9. ©ɹ2013 Internet Initiative Japan Inc. 1. lookup target files run

    pmux command on this host read USVTUFEHMVTUFSGTQBUIJOGP from xattr
  10. ©ɹ2013 Internet Initiative Japan Inc. Extended file attributes (xattr) • 

    a file system feature that enables users to associate computer files with metadata (wikipedia) •  GlusterFS uses Extended Attributes as a mechanism for external interaction with translators.
  11. ©ɹ2013 Internet Initiative Japan Inc. Extended file attributes (2) $

    sudo getfattr -n trusted.glusterfs.pathinfo \! access_log.20131020! # file: access_log.20131020! trusted.glusterfs.pathinfo="(<DISTRIBUTE:d2r2-! dht> (<REPLICATE:d2r2-replicate-0> <POSIX(/glu! sterfs/brick/d2r2):ex01.example.com:/glusterfs! /brick/d2r2/log/0000/access_log.20131020> <POS! IX(/glusterfs/brick/d2r2):ex00.example.com:/gl! usterfs/brick/d2r2/log/0000/access_log.2013102! 0>))"
  12. ©ɹ2013 Internet Initiative Japan Inc. 3. assign map tasks to

    nodes tasks are assigned to nodes(workers) dynamically dispatcher worker
  13. ©ɹ2013 Internet Initiative Japan Inc. 5. mapper produces tmp files

    mapper produces temporary files containing intermediate results dispatcher worker
  14. ©ɹ2013 Internet Initiative Japan Inc. example(1): count of status code

    extract the status code from Apache log files and count $ pmux --mapper='cut -d" " -f 9’ \ --reducer='sort|uniq -c’ /mnt/glusterfs/*.log 176331 200 106360 206 809 400 21852 403 533 404 27 406 805 416 25 500
  15. ©ɹ2013 Internet Initiative Japan Inc. example(2): word count $ pmux

    --mapper=map.rb --reducer=reduce.rb \ --file=map.rb –-file=reduce.rb \ /mnt/glusterfs/*.txt #! /usr/bin/ruby -an $F.each {|f| print "#{f}\t1\n"} #! /usr/bin/ruby -an BEGIN {$c = Hash.new 0} $c[$F[0]] += $F[1].to_i END {$c.each {|k, v| print "#{k} #{v}\n"}} map.rb reduce.rb command line
  16. ©ɹ2013 Internet Initiative Japan Inc. Performance 14:00:00.416011 IP 21.44.60.29.http >

    170.73.162.175.58546: . 3523999974:3524001422(1448) ack 3401170238 win 1716 <nop,nop,timestamp 1070614671 1955062367> packet capture logs (made by tcpdump) like this extract the most frequently appeared IP address on each file 8344 files, 500K lines/file, total 4 billion lines
  17. ©ɹ2013 Internet Initiative Japan Inc. result 8 hr 49 min

    6 sec 1 min 45 sec 300 times fater 1 node, without pmux 60 nodes (each node has 8 cores)