Slide 1

Slide 1 text

Shell Script Rewrite Overview Allen Wittenauer

Slide 2

Slide 2 text

Twitter: @_a__w_ (1 a 2 w 1) Email: aw @ apache.org!

Slide 3

Slide 3 text

3 What is the shell code?! ! ! bin/*! ! etc/hadoop/*sh! ! libexec/*! ! sbin/*! !

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

CUTTING, DOUG 1710 554 6239 2005 APACHE SOFTWARE FOUNDATION

Slide 6

Slide 6 text

6 https://www.flickr.com/photos/new_and_used_tires/6549497793/

Slide 7

Slide 7 text

7

Slide 8

Slide 8 text

8 https://www.flickr.com/photos/hkuchera/5084213883

Slide 9

Slide 9 text

9

Slide 10

Slide 10 text

10

Slide 11

Slide 11 text

11 https://www.flickr.com/photos/83633410@N07/7658225516/

Slide 12

Slide 12 text

“[The scripts] finally got to you, didn’t they?”

Slide 13

Slide 13 text

13 Primary Goals! Consistency! Code and Config Simplification! De-clash Parameters! Documentation! ! Secondary Goals! Backward Compatibility! “Lost” Ideas and Fixes!

Slide 14

Slide 14 text

14 https://www.flickr.com/photos/k6mmc/2176537668/

Slide 15

Slide 15 text

15 ! ! Tuesday, August 19, 2014 majority committed into trunk:! ! ! ! ! ! ... followed by many fixes & enhancements from the community

Slide 16

Slide 16 text

16 https://www.flickr.com/photos/ifindkarma/9304374538/   https://www.flickr.com/photos/liveandrock/2650732780/

Slide 17

Slide 17 text

17 Old:! ! hadoop -> hadoop-config.sh -> hadoop-env.sh! ! yarn -> yarn-config.sh -> yarn-env.sh! ! hdfs-> hdfs-config.sh -> hadoop-env.sh ! ! New:! ! hadoop -> hadoop-config.sh! -> hadoop-functions.sh! ! ! ! ! ! ! ! -> hadoop-env.sh! ! yarn -> yarn-config.sh! -> hadoop-config.sh -> (above)! ! ! ! ! ! ! -> yarn-env.sh! ! hdfs -> hdfs-config.sh! -> hadoop-config.sh -> (above)!

Slide 18

Slide 18 text

18 Old:! ! yarn-env.sh:!       JAVA_HOME=xyz   ! hadoop-env.sh:!     JAVA_HOME=xyz   ! mapred-env.sh:!     JAVA_HOME=xyz     New:! ! hadoop-env.sh!     JAVA_HOME=xyz   ! OS X:!     JAVA_HOME=$(/usr/libexec/java_home)

Slide 19

Slide 19 text

19 Old:! ! xyz_OPT=“-­‐Xmx4g”  hdfs  namenode       java  …  -­‐Xmx1000  …  -­‐Xmx4g  …     ! ! Command line size: ~2500 bytes! New:! ! xyz_OPT=“-­‐Xmx4g”  hdfs  namenode       java  …  -­‐Xmx4g  …   ! ! Command line size: ~1750 bytes

Slide 20

Slide 20 text

20 ! $  TOOL_PATH=blah:blah:blah  hadoop  distcp  /old  /new     Error:  could  not  find  or  load  main  class   org.apache.hadoop.tools.DistCp! ! Old:! ! $  bash  -­‐x  hadoop  distcp  /old  /new   +  this=/home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin/hadoop   +++  dirname  -­‐-­‐  /home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin/hadoop   ++  cd  -­‐P  -­‐-­‐  /home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin   ++  pwd  -­‐P   +  bin=/home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin   +  DEFAULT_LIBEXEC_DIR=/home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin/../libexec   +  HADOOP_LIBEXEC_DIR=/home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin/../libexec   +  [[  -­‐f  /home/aw/HADOOP/hadoop-­‐3.0.0-­‐SNAPSHOT/bin/../libexec/hadoop-­‐ config.sh  ]]   …   !

Slide 21

Slide 21 text

21 New:! ! $  TOOL_PATH=blah:blah:blah  hadoop  -­‐-­‐debug   distcp  /tmp/  /1     DEBUG:  HADOOP_CONF_DIR=/home/aw/HADOOP/conf     DEBUG:  Initial  CLASSPATH=/home/aw/HADOOP/conf             …     DEBUG:  Append  CLASSPATH:  /home/aw/HADOOP/ hadoop-­‐3.0.0-­‐SNAPSHOT/share/hadoop/mapreduce/*     DEBUG:  Injecting  TOOL_PATH  into  CLASSPATH     DEBUG:  Rejected  CLASSPATH:  blah:blah:blah  (does   not  exist)             …   !

Slide 22

Slide 22 text

22 Old:! ! hdfs help!

Slide 23

Slide 23 text

23 https://www.flickr.com/photos/joshuamckenty/2297179486/

Slide 24

Slide 24 text

24 New:! ! hdfs help!

Slide 25

Slide 25 text

25 Old:! !   hadoop  thisisnotacommand   ! ! == stack trace! New:!   hadoop  thisisnotacommand   ! ! == hadoop help

Slide 26

Slide 26 text

26 Old:! ! sbin/hadoop-­‐daemon.sh  start  namenode       sbin/yarn-­‐daemon.sh  start  resourcemanager   ! New:! ! bin/hdfs  -­‐-­‐daemon  start  namenode       bin/yarn  -­‐-­‐daemon  start  resourcemanager   ! ! + common daemon start/stop/status routines

Slide 27

Slide 27 text

27 hdfs  namenode vs hadoop-­‐daemon.sh  namenode   ! Old:!! ! - effectively different code paths! ! - no pid vs pid! ! ! - wait for socket for failure! New:! ! - same code path ! ! - hadoop-­‐daemon.sh  cmd => hdfs  -­‐-­‐daemon  cmd ! ! ! - both generate pid! ! - hdfs  -­‐-­‐daemon  status  namenode

Slide 28

Slide 28 text

28 Old:! ! “mkdir:  cannot  create  ”! ! “chown:  cannot  change  permission  of  ”! ! ! New:! ! “WARNING:    does  not  exist.  Creating.”! ! “ERROR:  Unable  to  create  .  Aborting.”! ! “ERROR:  Cannot  write  to  .”

Slide 29

Slide 29 text

29 Old:! ! (foo)  >  (foo).out     rm  (foo).out       = Open file handle! ! New:!   (foo)  >>  (foo).out     rm  (foo).out   ! ! = Closed file handle! ! ! = rotatable .out files!

Slide 30

Slide 30 text

30 Old:! ! sbin/*-­‐daemons.sh  -­‐>  slaves.sh  blah! ! (several hundred ssh processes later)! ! *crash*! ! ! New:! ! sbin/*-­‐daemons.sh -> hadoop-­‐functions.sh   ! slaves.sh -> hadoop-­‐functions.sh   ! pdsh or (if enabled) xargs  -­‐P! ! *real work gets done*

Slide 31

Slide 31 text

31 Old:!   egrep  -­‐c  ‘^#’  hadoop-­‐branch-­‐2/…/*-­‐env.sh   ! ! ! hadoop-env.sh: 59! ! ! ! mapred-env.sh: 21! ! ! ! yarn-env.sh: 60! New:! ! egrep  -­‐c  ‘^#’  hadoop-­‐trunk/…/*-­‐env.sh   ! ! ! hadoop-env.sh: 333! ! ! ! mapred-env.sh: 40! ! ! ! yarn-env.sh: 112! ! ! ! + hadoop-layout.sh.example : 77! ! ! ! + hadoop-user-functions.sh.example: 109

Slide 32

Slide 32 text

But wait! There’s more!

Slide 33

Slide 33 text

33 ! ! HADOOP_namenode_USER=hdfs ! ! ! hdfs  namenode only works as hdfs! ! ! Fun: HADOOP_fs_USER=aw! ! ! ! hadoop  fs only works as aw! ! ! hadoop  -­‐-­‐loglevel  WARN ! ! ! ! => WARN,whatever! ! hadoop  -­‐-­‐loglevel  DEBUG  -­‐-­‐daemon  start         => start daemon in DEBUG mode!

Slide 34

Slide 34 text

34 ! Old:! ! HADOOP_HEAPSIZE=15234          <-­‐-­‐-­‐  M  only     JAVA_HEAP_MAX="hahahah  you  set  something  in   HADOOP_HEAPSIZE"   ! New:! ! HADOOP_HEAPSIZE_MAX=15g     HADOOP_HEAPSIZE_MIN=10g        <-­‐-­‐-­‐  units!     JAVA_HEAP_MAX  removed  =>       no  Xmx  settings  ==  Java  default  

Slide 35

Slide 35 text

35 ! Old:! ! Lots of different yet same variables for settings   ! New:! ! Deprecated  ~60  variables     ${HDFS|YARN|KMS|HTTPFS|*}_{foo}  =>         HADOOP_{foo}

Slide 36

Slide 36 text

36 ! Old:! ! "I wonder what's in HADOOP_CLIENT_OPTS?"! ! "I want to override just this one thing in *-env.sh."! ! New:! ! ${HOME}/.hadooprc

Slide 37

Slide 37 text

37 ! shellprofile.d! ! ! bash snippets to easily inject:! ! ! classpath! ! ! JNI! ! ! Java command line options! ! ! ... and more!

Slide 38

Slide 38 text

38 https://www.flickr.com/photos/83633410@N07/7658230838/

Slide 39

Slide 39 text

Power Users Rejoice:! Function Overrides

Slide 40

Slide 40 text

40 Default *.out log rotation:! ! function  hadoop_rotate_log   {      local  log=$1;      local  num=${2:-­‐5};   !    if  [[  -­‐f  "${log}"  ]];  then  #  rotate  logs          while  [[  ${num}  -­‐gt  1  ]];  do            let  prev=${num}-­‐1              if  [[  -­‐f  "${log}.${prev}"  ]];  then                  mv  "${log}.${prev}"  "${log}.${num}"              fi              num=${prev}          done          mv  "${log}"  "${log}.${num}"      fi   } namenode.out.1  -­‐>  namenode.out.2   namenode.out  -­‐>  namenode.out.1

Slide 41

Slide 41 text

41 Put a replacement rotate function w/gzip support in hadoop-user-functions.sh!! ! function  hadoop_rotate_log   {      local  log=$1;      local  num=${2:-­‐5};   !    if  [[  -­‐f  "${log}"  ]];  then          while  [[  ${num}  -­‐gt  1  ]];  do              let  prev=${num}-­‐1              if  [[  -­‐f  "${log}.${prev}.gz"  ]];  then                  mv  "${log}.${prev}.gz"  "${log}.${num}.gz"              fi              num=${prev}          done          mv  "${log}"  "${log}.${num}"          gzip  -­‐9  "${log}.${num}"      fi   } namenode.out.1.gz  -­‐>  namenode.out.2.gz   namenode.out  -­‐>  namenode.out.1   gzip  -­‐9  namenode.out.1  -­‐>  namenode.out.1.gz

Slide 42

Slide 42 text

What if we wanted to log every daemon start in syslog?

Slide 43

Slide 43 text

43 Default daemon starter:! ! function  hadoop_start_daemon   {      local  command=$1      local  class=$2      shift  2   !    hadoop_debug  "Final  CLASSPATH:  ${CLASSPATH}"      hadoop_debug  "Final  HADOOP_OPTS:  ${HADOOP_OPTS}"   !    export  CLASSPATH      exec  "${JAVA}"  "-­‐Dproc_${command}"  ${HADOOP_OPTS}  "$ {class}"  "$@"   }  

Slide 44

Slide 44 text

44 Put a replacement start function in hadoop-user-functions.sh!! ! function  hadoop_start_daemon   {      local  command=$1      local  class=$2      shift  2   !    hadoop_debug  "Final  CLASSPATH:  ${CLASSPATH}"      hadoop_debug  "Final  HADOOP_OPTS:  ${HADOOP_OPTS}"   !    export  CLASSPATH      logger  -­‐i  -­‐p  local0.notice  -­‐t  hadoop  "Started  ${COMMAND}"      exec  "${JAVA}"  "-­‐Dproc_${command}"  ${HADOOP_OPTS}  "$ {class}"  "$@"   }

Slide 45

Slide 45 text

Secure Daemons

Slide 46

Slide 46 text

What if we could start them as non-root?

Slide 47

Slide 47 text

47 Setup:! ! sudoers (either /etc/sudoers or in LDAP):! ! hdfs   ALL=(root:root)   NOPASSWD:  /usr/bin/jsvc! ! hadoop-env.sh:! ! HADOOP_SECURE_COMMAND=/usr/sbin/sudo  

Slide 48

Slide 48 text

48 # hadoop-user-functions.sh: (partial code below)! function  hadoop_start_secure_daemon   {                  …      jsvc="${JSVC_HOME}/jsvc"   !    if  [[  “${USER}”  -­‐ne  "${HADOOP_SECURE_USER}"  ]];  then            hadoop_error  "You  must  be  ${HADOOP_SECURE_USER}  in  order  to  start  a   secure  ${daemonname}"          exit  1      fi                     …      exec  /usr/sbin/sudo  "${jsvc}"  "-­‐Dproc_${daemonname}"  \      -­‐outfile  "${daemonoutfile}"  -­‐errfile  "${daemonerrfile}"  \      -­‐pidfile  "${daemonpidfile}"  -­‐nodetach  -­‐home  "${JAVA_HOME}"  \      —user  "${HADOOP_SECURE_USER}"  \      -­‐cp  "${CLASSPATH}"  ${HADOOP_OPTS}  "${class}"  "$@"   }

Slide 49

Slide 49 text

49 $ hdfs  datanode! sudo launches jsvc as root! jsvc launches secure datanode! ! ! In order to get -­‐-­‐daemon  start to work, one other function needs to get replaced*, but that’s a SMOP, now that you know how!! ! ! * - hadoop_start_secure_daemon_wrapper  assumes it is running as root!

Slide 50

Slide 50 text

50 Lots more, but out of time... e.g.:! ! ! Internals for contributors! ! Unit tests! ! API documentation! ! Other projects in the works! ! ...! ! ! Reminder: This is in trunk. Ask vendors their plans!

Slide 51

Slide 51 text

51 https://www.flickr.com/photos/nateone/3768979925

Slide 52

Slide 52 text

Altiscale copyright 2015. All rights reserved. 52