Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taming the elephant – Hadoop Operations Automation

Taming the elephant – Hadoop Operations Automation

Hadoop is helping to solve many Big Data problems through massively distributed processing across many machines. This is creating amazing value for both business and engineering teams by allowing many terabytes of data to be processed and analyzed quickly and efficiently for almost real time decision making. The scale and distributed model of Hadoop is also creating new challenges for Operations departments. Operations must be prepared to provision, install, configure, maintain, and scale large clusters quickly and reliably in order to keep up with the requirements of Big Data to meet the needs of the business.

We tackled this topic at the recent presentation at the LA Chef/Hadoop Joint Meetup -- http://www.ustream.tv/recorded/33117271

Chris Hemphill

May 20, 2013
Tweet

Other Decks in Technology

Transcript

  1. Agenda Shopzilla - who are we? Hadoop Primer Hadoop at

    Shopzilla What challenges do we face What are we doing to solve them Details, details, details Thursday, April 18, 2013
  2. Shopzilla, Inc. is a leading source for connecting buyers and

    sellers online. Global audience of over 40 million shoppers each month 100 million products from tens of thousands of retailers. Thursday, April 18, 2013
  3. Hadoop The Apache™ Hadoop® project develops open-source software for reliable,

    scalable, distributed computing. Modules: Common, HDFS, YARN, MapReduce NameNode Secondary NameNode Job Tracker Data Nodes with TaskTracker Thursday, April 18, 2013
  4. Add Ons Ambari™: A web-based tool for provisioning, managing, and

    monitoring Apache Hadoop clusters Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. ZooKeeper™: A high-performance coordination service for distributed applications. Thursday, April 18, 2013
  5. Cloudera Distribution 5 Clusters 2 Production 2 Staging/QA 1 POC

    129 nodes - 16 nodes Thursday, April 18, 2013
  6. Hostnames hadoop poc nn dev 001 hadoop poc dn dev

    021 hadoop poc jt dev 001 hadoop poc bastion dev 001 Cluster name Node role Environment Node # Thursday, April 18, 2013
  7. Build Flow - Preflight Does 00:1B:21:2C:C0:E8 belong to a known

    asset? Is “kickstart” enabled? What is the hostname of the asset? What OS and version is the host set to use? Get the IP address via DNS. Get the asset ID. PXE dhcp Asset Tracker Thursday, April 18, 2013
  8. Build Flow - PXE PXE dhcp Asset Tracker ✓ PXE

    boot options default Ubuntu12.04-x86_64.vmlinuz append console-setup/layout=us preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw console=ttyS1,19200n8 \ locale=en_US text hostname=x netcfg/dhcp_timeout=120 initrd=Ubuntu12.04-x86_64.initrd.img BOOTIF=01-84-8F-69-FE-F6-5C \ auto=true interface=auto serial nofb Thursday, April 18, 2013
  9. Build Flow - PXE PXE dhcp Asset Tracker ✓ PXE

    boot options default Ubuntu12.04-x86_64.vmlinuz append console-setup/layout=us preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw console=ttyS1,19200n8 \ locale=en_US text hostname=x netcfg/dhcp_timeout=120 initrd=Ubuntu12.04-x86_64.initrd.img BOOTIF=01-84-8F-69-FE-F6-5C \ auto=true interface=auto serial nofb 00:1B:21:2C:C0:E8 ✓ ✓ Thursday, April 18, 2013
  10. Build Flow - Preseed 00:1B:21:2C:C0:E8 default Ubuntu12.04-x86_64.vmlinuz append console-setup/layout=us preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw

    console=ttyS1,19200n8 \ locale=en_US text hostname=x netcfg/dhcp_timeout=120 initrd=Ubuntu12.04-x86_64.initrd.img BOOTIF=01-84-8F-69-FE-F6-5C \ auto=true interface=auto serial nofb preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Thursday, April 18, 2013
  11. Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker ‣Perl +

    Template module to build preseed file ‣Takes the asset ID as the argument ‣Collect further data from the asset tracker Thursday, April 18, 2013
  12. Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw [% IF KickstartEnable AND

    KickstartEnable != 'Diskless'; %] [% IF OSName AND OSName == 'Ubuntu' AND OSVersion AND (matches = OSVersion.match('12.04')); %] [% IF Model AND Model == 'PowerEdge C6220' AND Name AND (matches = Name.match("hadoop.*(nn|jt).*.sea$")); %] d-i partman-auto/disk string /dev/sda d-i partman-auto/method string regular d-i partman-auto/purge_lvm_from_device boolean true d-i partman-auto/expert_recipe string \ boot-root :: \ 1 1 1 free \ method{ biosgrub } \ . \ 40 50 100 ext3 \ $primary{ } $bootable{ } \ method{ format } format{ } \ use_filesystem{ } \ filesystem{ ext3 } \ mountpoint{ /boot } \ . \ 8000 70 8000 linux-swap \ method{ swap } format{ } \ . \ 500 10000 1000000000 ext4 \ method{ format } format{ } \ use_filesystem{ } \ filesystem{ ext4 } \ mountpoint{ / } \ . Asset Tracker Thursday, April 18, 2013
  13. [% ELSIF Model AND Model == 'PowerEdge C6220' AND Name

    AND ((matches = Name.match("hadoop.*dn.*.sea$"))); %] d-i partman-auto/disk string /dev/sda d-i partman-auto/method string regular d-i partman-auto/purge_lvm_from_device boolean true d-i partman-auto/expert_recipe string \ boot-root :: \ 1 1 1 free \ method{ biosgrub } \ . \ 40 50 100 ext3 \ $primary{ } $bootable{ } \ method{ format } \ format{ } \ use_filesystem{ } \ filesystem{ ext3 } \ mountpoint{ /boot } \ . \ 25600 60 51200 ext4 \ method{ format } \ format{ } \ use_filesystem{ } \ filesystem{ ext4 } \ mountpoint{ / } \ . \ 8000 70 8000 linux-swap \ method{ swap } format{ } \ . \ 500 10000 1000000000 ext4 \ method{ format } format{ } \ use_filesystem{ } \ filesystem{ ext4 } \ mountpoint{ /data/1 } \ options/noatime{ noatime } \ . \ Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker Thursday, April 18, 2013
  14. Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker Limitation -

    can only manage one volume. #!/usr/bin/env bash #Yes this is really dumb and sucks, but you can't do multiple disks that aren't in an lvm according to all the docs we could find. HOSTNAME=`hostname --fqdn` if echo $HOSTNAME | egrep 'hadoop.*dn.*.sea$' || echo $HOSTNAME | egrep 'hadoop.*dev.*.sea'; then USEDDEV=`mount | grep '/boot' | awk '{gsub('/[0-9]/',"");print $1}'` ALLDEV=`ls /dev/sd* | grep "sd[a-z]$"` DEVCNT=`echo "$ALLDEV" | wc -l` DATACNT=2 for sd in `echo "$ALLDEV" | grep -v $USEDDEV`; do echo ';' | sfdisk "$sd" sleep 2 mkfs.ext4 -m1 -O dir_index,extent,sparse_super "$sd"1 echo "${sd}1 /data/$DATACNT ext4 noatime 0 2" >> /etc/fstab # mount each /dev/sd* to /data/# mkdir -p /data/$DATACNT DATACNT=`expr $DATACNT + 1` done fi Thursday, April 18, 2013
  15. chef chef/chef_server_url string http://chef.shopzilla.com:4000/ d-i preseed/late_command string in-target wget -q

    http://ks.shopzilla.laxhq/ksdone.cgi?id=[% id %];\ in-target sh -c '/usr/bin/curl http://ks.shopzilla.laxhq/chef/chef-configure.sh | sh';\ in-target sh -c '/usr/bin/curl http://ks.shopzilla.laxhq/chef/hadoop-disks.sh | sh' Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker #!/usr/bin/env bash #Yes this is really dumb and sucks, but you can't do multiple disks that aren't in an lvm according to all the docs we could find. HOSTNAME=`hostname --fqdn` if echo $HOSTNAME | egrep 'hadoop.*dn.*.sea$' || echo $HOSTNAME | egrep 'hadoop.*dev.*.sea'; then USEDDEV=`mount | grep '/boot' | awk '{gsub('/[0-9]/',"");print $1}'` ALLDEV=`ls /dev/sd* | grep "sd[a-z]$"` DEVCNT=`echo "$ALLDEV" | wc -l` DATACNT=2 for sd in `echo "$ALLDEV" | grep -v $USEDDEV`; do echo ';' | sfdisk "$sd" sleep 2 mkfs.ext4 -m1 -O dir_index,extent,sparse_super "$sd"1 echo "${sd}1 /data/$DATACNT ext4 noatime 0 2" >> /etc/fstab # mount each /dev/sd* to /data/# mkdir -p /data/$DATACNT DATACNT=`expr $DATACNT + 1` done fi Thursday, April 18, 2013
  16. Build Flow - Preseed 00:1B:21:2C:C0:E8 '/usr/bin/curl http://ks.shopzilla.laxhq/chef/chef-configure.sh | sh' #!

    /bin/sh /etc/init.d/chef-client stop echo > /var/log/chef/client.log set -e mkdir -p /etc/chef /root/.chef_server_url cat > /etc/chef/validation.pem <<EOF -----BEGIN RSA PRIVATE KEY----- ... -----END RSA PRIVATE KEY----- EOF cat > /root/.chef/kickstart.pem <<EOF -----BEGIN RSA PRIVATE KEY----- ... -----END RSA PRIVATE KEY----- EOF cat > /root/.chef/knife.rb <<EOF log_level :info log_location STDOUT node_name 'kickstart' client_key '/root/.chef/kickstart.pem' validation_client_name 'chef-validator' validation_key '/etc/chef/validation.pem' chef_server_url 'http://chef.shopzilla.com:4000/' cache_type 'BasicFile' cache_options( :path => '/root/.chef/checksums' ) EOF Install certificates and configure the client Thursday, April 18, 2013
  17. Build Flow - Preseed 00:1B:21:2C:C0:E8 '/usr/bin/curl http://ks.shopzilla.laxhq/chef/chef-configure.sh | sh' ENV=prod

    if echo $HOSTNAME | grep -Eq 'stage[0-9]{3}' then ENV=stage fi echo "node_name '$HOSTNAME'" >> /etc/chef/client.rb cat > /etc/chef/node.json <<EOF { "name": "$HOSTNAME", "chef_environment": "$ENV", "json_class": "Chef::Node", "automatic": { }, "normal": { }, "chef_type": "node", "default": { }, "override": { }, "run_list": [ "role[base]" ] } EOF knife client delete $HOSTNAME -y --config /root/.chef/knife.rb --key /root/.chef/kickstart.pem || true knife node delete $HOSTNAME -y --config /root/.chef/knife.rb --key /root/.chef/kickstart.pem || true knife node from file /etc/chef/node.json --config /root/.chef/knife.rb --key /root/.chef/kickstart.pem exit 0 Set the node’s chef environment Thursday, April 18, 2013
  18. Build Flow - Chef hadooppocdndev021 if node.name.match(/^hadoop.*(dn|nn|jt|bastion).*[0-9]../) hostptrn = node.hostname.gsub(/^hadoop/,"")

    hostenv = hostptrn.match(/prod[0-9]..|dev[0-9]..|stage[0-9]..|qa[0-9]../)[0].gsub(/[0-9]../,"") hostrole = hostptrn.gsub(/#{hostenv}[0-9]../,"").match(/(jt|nn|dn|bastion)$/)[0] hostnum = hostptrn.match(/[0-9]../)[0] cluster = hostptrn.gsub(/#{hostenv}[0-9]../,"").gsub(/#{hostrole}$/,"") primarynn = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"nn#{hostenv}001") jobtracker = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"jt#{hostenv}001") pribastion = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"bastion#{hostenv}001") hostenv = dev hostrole = dn hostnum = 021 cluster = poc Parse the hostname Thursday, April 18, 2013
  19. Build Flow - Chef hadooppocdndev021 if node.name.match(/^hadoop.*(dn|nn|jt|bastion).*[0-9]../) hostptrn = node.hostname.gsub(/^hadoop/,"")

    hostenv = hostptrn.match(/prod[0-9]..|dev[0-9]..|stage[0-9]..|qa[0-9]../)[0].gsub(/[0-9]../,"") hostrole = hostptrn.gsub(/#{hostenv}[0-9]../,"").match(/(jt|nn|dn|bastion)$/)[0] hostnum = hostptrn.match(/[0-9]../)[0] cluster = hostptrn.gsub(/#{hostenv}[0-9]../,"").gsub(/#{hostrole}$/,"") primarynn = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"nn#{hostenv}001") jobtracker = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"jt#{hostenv}001") pribastion = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"bastion#{hostenv}001") hostenv = dev hostrole = dn hostnum = 021 cluster = poc primarynn = hadooppocnndev001 jobtracker = hadooppocjtdev001 pribastion = hadooppocbastiondev001 Parse the hostname Thursday, April 18, 2013
  20. Build Flow - Chef hadooppocdndev021 group "hdfs" do gid 50101

    end group "mapred" do gid 50102 end user "hdfs" do uid 50101 gid "hdfs" shell "/bin/bash" home "/var/lib/hdfs" comment "Hadoop HDFS" end user "mapred" do uid 50102 gid "mapred" shell "/bin/bash" home "/var/lib/hadoop-mapreduce" comment "Hadoop MapReduce" end group "hadoop" do gid 50100 members ['hdfs','mapred'] end Create the the hadoop users Thursday, April 18, 2013
  21. Install the Hadoop packages Build Flow - Chef hadooppocdndev021 common_packages

    = [ 'oracle-j2sdk1.6','cloudera-manager-agent','bigtop-utils','bigtop-jsvc','bigtop-tomcat', 'hadoop','hadoop-hdfs','hadoop-httpfs','hadoop-mapreduce','hadoop-client','hadoop-hdfs-fuse', 'hbase','hive','oozie','pig','hue-plugins','hue-common','hue-shell','hue','sqoop','flume-ng', 'hadoop-lzo','lzop','liblzo2-dev','bc' ] nn_packages = [ 'cloudera-manager-server','cloudera-manager-server-db', ] dn_packages = [ ] bastion_packages = [ 'elephantbird','hue-server','hivelibs' ] Thursday, April 18, 2013
  22. Install the Hadoop packages Build Flow - Chef hadooppocdndev021 common_packages

    = [ 'oracle-j2sdk1.6','cloudera-manager-agent','bigtop-utils','bigtop-jsvc','bigtop-tomcat', 'hadoop','hadoop-hdfs','hadoop-httpfs','hadoop-mapreduce','hadoop-client','hadoop-hdfs-fuse', 'hbase','hive','oozie','pig','hue-plugins','hue-common','hue-shell','hue','sqoop','flume-ng', 'hadoop-lzo','lzop','liblzo2-dev','bc' ] nn_packages = [ 'cloudera-manager-server','cloudera-manager-server-db', ] dn_packages = [ ] bastion_packages = [ 'elephantbird','hue-server','hivelibs' ] ########## Packages ############# common_packages.each do |pkg| package pkg do action :install end end ########## bastion packages if hostrole == "bastion" bastion_packages.each do |pkg| package pkg do action :install end end end Thursday, April 18, 2013
  23. Install the Hadoop packages Build Flow - Chef hadooppocdndev021 common_packages

    = [ 'oracle-j2sdk1.6','cloudera-manager-agent','bigtop-utils','bigtop-jsvc','bigtop-tomcat', 'hadoop','hadoop-hdfs','hadoop-httpfs','hadoop-mapreduce','hadoop-client','hadoop-hdfs-fuse', 'hbase','hive','oozie','pig','hue-plugins','hue-common','hue-shell','hue','sqoop','flume-ng', 'hadoop-lzo','lzop','liblzo2-dev','bc' ] nn_packages = [ 'cloudera-manager-server','cloudera-manager-server-db', ] dn_packages = [ ] bastion_packages = [ 'elephantbird','hue-server','hivelibs' ] ########## Name node packages if hostrole == "nn" nn_packages.each do |pkg| package pkg do action :install end end include_recipe "mysql::server" include_recipe "database::mysql" end Thursday, April 18, 2013
  24. Configurations Build Flow - Chef hadooppocdndev021 ########## Primary name node

    only if hostrole == "nn" && hostnum == "001" if Dir['/var/lib/cloudera-scm-server-db/data/*'].empty? execute "scm_db_init" do command "/etc/init.d/cloudera-scm-server-db initdb" end end service "cloudera-scm-server-db" do supports :status => true, :restart => true action [ :enable, :start ] end service "cloudera-scm-server" do supports :status => true, :restart => true action [ :enable, :start ] end end Create the SCM manager DB if it doesn’t exists. Enable & start the DB service. Enable & start the SCM manager. Thursday, April 18, 2013
  25. Configurations Build Flow - Chef hadooppocdndev021 ########## DNs & NNs

    - Config & start SCM agent & point it to primary name node if hostrole == "dn" || hostrole == "nn" || hostrole == "jt" ruby_block "Update SCM config.ini" do block do cluster_scm = primarynn rc = Chef::Util::FileEdit.new("/etc/cloudera-scm-agent/config.ini") rc.search_file_replace_line(/^server_host=localhost/, "server_host=#{primarynn}") rc.write_file end end service "cloudera-scm-agent" do supports :status => true, :restart => true action [ :enable, :start ] end Update Cloudera SCM agent config to point to the primary name node. Enable & start the SCM agent service. Thursday, April 18, 2013
  26. node['mysql']['server_debian_password'] = "..." node['mysql']['server_repl_password'] = "..." node['mysql']['server_root_password'] = "..." ###

    Clodera SCM / Hive server settings node['mysql']['tunable']['key_buffer'] = "16M" node['mysql']['tunable']['key_buffer_size'] = "32M" node['mysql']['tunable']['max_allowed_packet'] = "16M" node['mysql']['tunable']['thread_stack'] = "128K" node['mysql']['tunable']['thread_cache_size'] = "64" node['mysql']['tunable']['query_cache_limit'] = "8M" node['mysql']['tunable']['query_cache_size'] = "64M" node['mysql']['tunable']['query_cache_type'] = "1" node['mysql']['tunable']['max_connections'] = "600" node['mysql']['tunable']['read_buffer_size'] = "2M" node['mysql']['tunable']['read_rnd_buffer_size'] = "16M" node['mysql']['tunable']['sort_buffer_size'] = "8M" node['mysql']['tunable']['join_buffer_size'] = "8M" node['mysql']['tunable']['innodb_file_per_table'] = "1" node['mysql']['tunable']['innodb_flush_log_at_trx_commit'] = "2" node['mysql']['tunable']['innodb_log_buffer_size'] = "64M" node['mysql']['tunable']['innodb_buffer_pool_size'] = "2048M" node['mysql']['tunable']['innodb_thread_concurrency'] = "8" node['mysql']['tunable']['innodb_flush_method'] = "O_DIRECT" node['mysql']['tunable']['character-set-server'] = "latin1" node['mysql']['tunable']['collation-server'] = "latin1_swedish_ci" dbconn = {:host => "localhost", :username => 'root', :password => node['mysql']['server_root_password']} MySQL Configurations Build Flow - Chef hadooppocdndev021 Thursday, April 18, 2013
  27. if hostrole == "bastion" if hostnum == "001" hivepwd =

    "..." mysql_database 'hive' do connection dbconn action :create end mysql_database_user 'hive' do connection dbconn password hivepwd host '%' database_name 'hive' action :grant end end template "/etc/hive/conf/hive-site.xml" do source "hadoop/hive-site.xml.erb" variables({ :pribastion => pribastion, :namenode => primarynn }) end MySQL Configurations Build Flow - Chef hadooppocdndev021 Thursday, April 18, 2013
  28. Build Flow - Chef hadooppocdndev021 template "/etc/hive/conf/hive-site.xml" do source "hadoop/hive-site.xml.erb"

    variables({ :pribastion => pribastion, :namenode => primarynn }) <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://<%= @pribastion %>:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> ... Hive Configurations Thursday, April 18, 2013
  29. Preseed Chef API Install base OS Install packages Configure the

    cluster ✓ ✓ Thursday, April 18, 2013
  30. hadooppocdndev021 • REST API • Basic HTTP authentication • Takes

    & returns JSON • HTTP: POST - Create entries GET - Read entires PUT - Update entries DELETE - Delete entries Build Flow - API Thursday, April 18, 2013
  31. hadooppocdndev021 ruby_block "Hadoop Provision" do block do hostvolumes=`mount | grep

    '/data/[0-9].' | awk '{print $3}' | sort -V`.split(/\n/).join(',') # Get list of /data mounts result = Net::HTTP.get(URI.parse("http://ks.shopzilla.laxhq/hadoop/hadoopprov.py?\ hostname=#{node.name}&\ cluster=#{cluster}&\ hostnum=#{hostnum}&\ role=#{hostrole}&\ env=#{hostenv}&\ primarynn=#{primarynn}&\ volumes=#{hostvolumes}&\ memory=#{node.memory.total}")) end end Build Flow - API hostenv = dev hostrole = dn hostnum = 021 cluster = poc primarynn = hadooppocnndev001 jobtracker = hadooppocjtdev001 pribastion = hadooppocbastiondev001 Thursday, April 18, 2013
  32. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 Python interface to Cloudera API ! hostname=hadooppocdndev021.shopzilla.sea !

    cluster=poc ! hostnum=021 ! role=dn ! env=dev ! primarynn=hadooppocnndev001.shopzilla.sea ! volumes=/data/1,/data/2,/data/3,/data/4 ! memory=67108864kB Build Flow - API Thursday, April 18, 2013
  33. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 Python interface to Cloudera API hadooppocnndev001 ! hostname=hadooppocnndev001.shopzilla.sea

    ! cluster=poc ! hostnum=001 ! role=nn ! env=dev ! primarynn=hadooppocnndev001.shopzilla.sea ! volumes=/data/1 ! memory=67108864kB Build Flow - API Thursday, April 18, 2013
  34. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 1. Assemble a cluster name - "<cluster>

    <env> Cluster" = “POC DEV Cluster” 2. Check if host is checked into SCM. 3. Check if host already has roles assigned. If it does, abort. 4. Get a list of configured clusters from SCM. Is “POC DEV Cluster” one of them? 5. Get a list of configured services from SCM. 6. Pull down our configuration templates from Git. 7. NN - If there is no cluster “POC DEV Cluster”, create it. DN - 1. If there is “POC DEV Cluster”, add host to it. 2. Assign the node to the running services. 3. Calculate map/reduce slots based on host’s RAM & data volumes. Build Flow - API Thursday, April 18, 2013
  35. GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/hosts/hadooppocdndev021.shopzilla.sea http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 2. Check if host

    is checked into SCM. hadooppocdndev021 { ... "hostname" : "hadooppocdndev021.shopzilla.sea", "ipAddress" : "10.101.173.35", "lastHeartbeat" : "2013-04-04T12:52:45.764Z", ... "roleRefs" : [ { "roleName" : "poc-hdfs-021", "serviceName" : "poc-hdfs", "clusterName" : "POC DEV Cluster" }, { "roleName" : "poc-mapred-021", "serviceName" : "poc-mapred", "clusterName" : "POC DEV Cluster" } ] } OR 3. Check if host already has roles assigned. If it does, abort. “roleRefs” should be empty Build Flow - API { "message" : "Host 'hadooppocdndev021.shopzilla.sea' not found." } Thursday, April 18, 2013
  36. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 4. Get a list of configured clusters

    from SCM. hadooppocdndev021 { "items" : [ { "name" : "POC DEV Cluster", "version" : "CDH4", "maintenanceMode" : false, "maintenanceOwners" : [ ] } ] } { "items" : [ { "name" : "poc-hdfs", "type" : "HDFS", "displayName" : "poc-hdfs", ... "serviceState" : "STARTED", ... }, { "name" : "poc-mapred", "type" : "MAPREDUCE", ... } ] } 5. Get a list of configured services from SCM. GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services Build Flow - API Thursday, April 18, 2013
  37. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our config templates from

    Git. hadooppocdndev021 Will load poc_hadoop.json, or defaults_hadoop.json if POC does not have its own Build Flow - API default_hadoop.json poc_hadoop.json ods_hadoop.json Thursday, April 18, 2013
  38. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our config templates from

    Git. hadooppocdndev021 ! "#$%&'(%)"*+! ++++",-./012/3-4%5)&"*+6 ++++++++! ++++++++++++",-./012/"*+"78089:7;"<+ ++++++++++++"5=/>&"+*6 ++++++++++++++++!+"4?>/"*+"$%&'@?.?4(/'@?4$A5$=#B/,C/("<+"D?.E/"*+FGHGIIJJ++K< ++++++++++++++++!+"4?>/"*+"$%&'$?=?4-$/'$E',/&/,D/$"<+"D?.E/"*+LJMNMHLIGHJ+K< ++++++++++++++++!+"4?>/"*+"$%&'$?=?4-$/'>?O'O(5/D/,&"<+"D?.E/"*+ILPG+K< ++++++++++++++++!+"4?>/"*+"$%&'$?=?'$5,'.5&="<+"D?.E/"*+"Q$?=?QLQ$%&Q$4"+K< ++++++++++++++++!+"4?>/"*+"$?=?4-$/'R?D?'#/?2&5S/"<+"D?.E/"*+GLHMHINTHI+K ++++++++++++U ++++++++K< ++++++++! ++++++++++++",-./012/"*+"98V;9:7;"<+ ++++++++++++"5=/>&"+*6 ++++++++++++++++!+"4?>/"*+"%&'=,?&#'54=/,D?."<+"D?.E/"*+LHHJ+K< ++++++++++++++++!+"4?>/"*+"$%&'4?>/'$5,'.5&="<+"D?.E/"*+"Q$?=?QLQ$%&Q44"+K ++++++++++++U ++++++++K ++++U< ++++"5=/>&"*+6 ++++++++++++!+"4?>/"*+"$%&'@.-(W'&5S/"<+"D?.E/"*+LNHGLMMGI+K< ++++++++++++!+"4?>/"*+"$%&'(.5/4='E&/'$?=?4-$/'#-&=4?>/"<+"D?.E/"*+"=,E/"K< ++++++++++++!+"4?>/"*+"$%&'2/,>5&&5-4&'&E2/,),-E2"<+"D?.E/"*+"&E2/,),-E2"+K ++++U K< HDFS service wide configurations Build Flow - API Thursday, April 18, 2013
  39. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our config templates from

    Git. hadooppocdndev021 ">?2,/$'(%)"*++! ++++",-./012/3-4%5)&"*+6 ++++++++! ++++++++++++",-./012/"*+"X80;Y8Z"< ++++++++++++"5=/>&"*+6 ++++++++++++++++!+"4?>/"*+">?2,/$'(#5.$'R?D?'-2=&'>?O'#/?2"<+"D?.E/"*+GLHMHINTHI+K< ++++++++++++++++!+"4?>/"*+"5-'&-,='>@"<+"D?.E/"*+GFT+K< ++++++++++++++++!+"4?>/"*+">?2,/$'>?2'=?&W&'&2/(E.?=5D/'/O/(E=5-4"<+"D?.E/"*+"%?.&/"+K< ++++++++++++++++!+"4?>/"*+">?2,/$',/$E(/'=?&W&'&2/(E.?=5D/'/O/(E=5-4"<+"D?.E/"*+"%?.&/"+K< ++++++++++++UK< ++++++++! ++++++++++++",-./012/"*+"08C[0\83[;\"< ++++++++++++"5=/>&"*+6 ++++++++++++++++!+"4?>/"*+"=?&W'=,?(W/,'R?D?'#/?2&5S/"<+"D?.E/"*+GLHMHINTHI+K ++++++++++++UK< ++++++++! ++++++++++++",-./012/"*+"]:^0\83[;\"< ++++++++++++"5=/>&"*+6 ++++++++++++++++!+"4?>/"*+"A/@54=/,%?(/'2,5D?=/'?(=5-4&"<+"D?.E/"*+"%?.&/"+K< ++++++++++++++++!+"4?>/"*+">?2,/$'R-@=,?(W/,'=?&WC(#/$E./,"<+"D?.E/"*+"-,)_?2?(#/_#?$--2_>?2,/$_`?5,C(#/$E./,"+K< ++++++++++++++++!+"4?>/"*+"R-@=,?(W/,'>?2,/$'.-(?.'$5,'.5&="<+"D?.E/"+*+"Q$?=?QLQ>?2,/$QR="+K< ++++++++++++++++!+"4?>/"*+">?2,/$'R-@'=,?(W/,'#?4$./,'(-E4="<+"D?.E/"+*+"HI"+K ++++++++++++UK ++++U< ++++"5=/>&"*+6 ++++++++!+"4?>/"*+"#$%&'&/,D5(/"<+"D?.E/"*+"$&a#$%&"+K< ++++++++!+"4?>/"*+"5-'%5./'@E%%/,'&5S/"<+"D?.E/"*+TFFNT+K ++++U++ K< Build Flow - API Map/Reduce service wide configurations Thursday, April 18, 2013
  40. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our config templates from

    Git. hadooppocdndev021 "&S'2,/%&"*+! ++++++++++++"2,/%&"*+!+ ">?2,/$',?=5-"*+"G*L"<+ ",?>',/&/,D/,')@"*+T< "$/%'&/,D5(/&"*+6"b7`C"<"V8B\;7c3;"<"d::[;;B;\"U<+ "$%&'%?E.='=-./,?4(/'2/,("*+FJ+K ++++K K Build Flow - API In-house preferences Thursday, April 18, 2013
  41. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. NN - If there is no

    cluster “POC DEV Cluster”, create it using configs from templates. POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters "sz_prefs": { "def_services": ["HDFS","MAPREDUCE","ZOOKEEPER"] ... load poc_hadoop.json, or defaults_hadoop.json if POC does not have its own { "items" :[{ "name": "POC DEV Cluster", "version": "CDH4", "services": [ { "name": "poc-mapred", "type": "MAPREDUCE", "clusterRef": {"clusterName": "POC DEV Cluster"} },{ "name": "poc-hdfs", "type": "HDFS", "clusterRef": {"clusterName": "POC DEV Cluster"} }, ... ] }] } Build Flow - API Thursday, April 18, 2013
  42. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. NN - If there is no

    cluster “POC DEV Cluster”, create it using configs from templates. "hdfs_cfg": { "roleTypeConfigs": [ { "roleType": "DATANODE", "items" :[ { "name": "dfs_balance_bandwidthPerSec", "value": 52428800 }, ] ... } PUT - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs "mapred_cfg": { "roleTypeConfigs": [ { "roleType": "GATEWAY", "items": [ { "name": "mapred_child_java_opts_max_heap", "value": 2147483648 }, ] ... }, }, PUT - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred Repeat for any other services Build Flow - API Thursday, April 18, 2013
  43. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. NN - If there is no

    cluster “POC DEV Cluster”, create it using configs from #4 { "items":[{ "type": "NAMENODE", "name": "poc-hdfs-NN-001", "hostRef": {"hostId": "hadooppocnndev001.shopzilla.sea" } }] } POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/roles Build Flow - API Thursday, April 18, 2013
  44. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. DN - If there is “POC

    DEV Cluster”, add host to it hadooppocdndev021 { "items" : [ { "name" : "POC DEV Cluster", "version" : "CDH4", "maintenanceMode" : false, "maintenanceOwners" : [ ] } ] } { "items" : [ { "name" : "poc-hdfs", "type" : "HDFS", "displayName" : "poc-hdfs", ... "serviceState" : "STARTED", ... }, { "name" : "poc-mapred", "type" : "MAPREDUCE", ... } ] } GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/ GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster Build Flow - API Thursday, April 18, 2013
  45. http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. DN - If there is a

    “POC DEV Cluster”, add host to it - HDFS { "items": [{ "type": "DATANODE", "name": “poc-hdfs-021”, "hostRef": {"hostId": "hadooppocdndev021.shopzilla.sea"} }] } POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/roles # hostvolumes=/data/1,/data/2,/data/3,/data/4 # hdfsvolumes = /data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn,/data/4/dfs/dn { "items": [ {"name": "dfs_data_dir_list", "value": hdfsvolumes } ] } PUT - http://hadooppocn..:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/roles/poc-hdfs-021/config hadooppocdndev021 Build Flow - API Thursday, April 18, 2013
  46. PUT - http://hado......:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/roles/poc-mapred-021/config # volumes=/data/1,/data/2,/data/3,/data/4 # mapredvolumes=/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local,/data/4/mapred/local #

    memory=67108864kB { "items": [ { "name": "tasktracker_mapred_local_dir_list", "value": mapredvolumes }, { "name": "mapred_tasktracker_map_tasks_maximum", "value": mapslots }, { "name": "mapred_tasktracker_reduce_tasks_maximum", "value": redslots } ] } http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev02 hadooppocnndev001 7. DN - If there is “POC DEV Cluster”, add host to it - MAPREDUCE hadooppocdndev021 { "items": [{ "type": "TASKTRACKER", "name": “poc-mapred-021”, "hostRef": {"hostId": "hadooppocdndev021.shopzilla.sea"} }] } POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/roles Build Flow - API Thursday, April 18, 2013
  47. PUT - http://hado......:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/roles/poc-mapred-021/config # volumes=/data/1,/data/2,/data/3,/data/4 # mapredvolumes=/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local,/data/4/mapred/local #

    memory=67108864kB { "items": [ { "name": "tasktracker_mapred_local_dir_list", "value": mapredvolumes }, { "name": "mapred_tasktracker_map_tasks_maximum", "value": mapslots }, { "name": "mapred_tasktracker_reduce_tasks_maximum", "value": redslots } ] } mapslots and redslots are calculated based on the ratio provided in the config based on the available RAM. "sz_prefs": { "prefs": { "mapred_ratio": "2:1", "ram_reserve_gb": 6, } } http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev02 hadooppocnndev001 7. DN - If there is “POC DEV Cluster”, add host to it - MAPREDUCE hadooppocdndev021 Build Flow - API Thursday, April 18, 2013
  48. hadooppocbastiondev001 http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocnndev001 Client configuration GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services

    { "items" : [ { "name" : "poc-hdfs", "type" : "HDFS", "displayName" : "poc-hdfs", ... "serviceState" : "STARTED", ... }, { "name" : "poc-mapred", "type" : "MAPREDUCE", ... } ] } { "HDFS" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/clientConfig", "MAPREDUCE" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/clientConfig" } Loop over the services & generate configuration links Build Flow - API Thursday, April 18, 2013
  49. hadooppocbastiondev001 http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocnndev001 Client configuration { "HDFS" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV

    Cluster/services/poc-hdfs/clientConfig", "MAPREDUCE" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/clientConfig" } Loop over the services & generate configuration links Back in Chef - directory "/tmp/hadoopconf" do action :create end if clientconfigs.key?('HDFS') remote_file "/tmp/hadoopconf/hdfs-config.zip" do source URI.escape(clientconfigs['HDFS']) end end if clientconfigs.key?('MAPREDUCE') remote_file "/tmp/hadoopconf/mapred-config.zip" do source URI.escape(clientconfigs['MAPREDUCE']) end end execute "Copy hadoop configs" do command "unzip -o '/tmp/hadoopconf/*.zip' -d /tmp/hadoopconf/ && mv /tmp/hadoopconf/hadoop-conf/* /etc/hadoop/conf/" action :run end Build Flow - API Thursday, April 18, 2013
  50. Preseed Chef API Install base OS Install packages Configure the

    cluster ✓ ✓ ✓ Thursday, April 18, 2013
  51. API - General http://cloudera.github.io/cm_api/ " Cloudera’s Python library " Complete

    documentation Configuration samples from your own cluster http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/config http://hadooppo....shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/config?view=full " JSON dump of all the configurations you set (non-deafults). " Complete JSON dump of all possible configurations. Thursday, April 18, 2013