Taming the elephant – Hadoop Operations Automation

Slide 1

Slide 1 text

Automating the Hadoop Stack A view into Shopzilla’s automation at scale Thursday, April 18, 2013

Slide 2

Slide 2 text

Agenda Shopzilla - who are we? Hadoop Primer Hadoop at Shopzilla What challenges do we face What are we doing to solve them Details, details, details Thursday, April 18, 2013

Slide 3

Slide 3 text

Chris Hemphill Director Systems Engineering Thursday, April 18, 2013

Slide 4

Slide 4 text

Roman Gazaryants Hadoop Administrator Thursday, April 18, 2013

Slide 5

Slide 5 text

Shopzilla, Inc. is a leading source for connecting buyers and sellers online. Global audience of over 40 million shoppers each month 100 million products from tens of thousands of retailers. Thursday, April 18, 2013

Slide 6

Slide 6 text

Thursday, April 18, 2013

Slide 7

Slide 7 text

Hadoop The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Modules: Common, HDFS, YARN, MapReduce NameNode Secondary NameNode Job Tracker Data Nodes with TaskTracker Thursday, April 18, 2013

Slide 8

Slide 8 text

Add Ons Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-ﬂow language and execution framework for parallel computation. ZooKeeper™: A high-performance coordination service for distributed applications. Thursday, April 18, 2013

Slide 9

Slide 9 text

Cloudera Thursday, April 18, 2013

Slide 10

Slide 10 text

Cloudera Distribution 5 Clusters 2 Production 2 Staging/QA 1 POC 129 nodes - 16 nodes Thursday, April 18, 2013

Slide 11

Slide 11 text

Thursday, April 18, 2013

Slide 12

Slide 12 text

CHALLENGES Consistency Scaling quickly Scaling reliably Mixed Hardware Rollback Thursday, April 18, 2013

Slide 13

Slide 13 text

Asset Management Hostname consistency Kickstart/Pre-seed Chef Cloudera Hadoop API SOLUTIONS Thursday, April 18, 2013

Slide 14

Slide 14 text

Thursday, April 18, 2013

Slide 15

Slide 15 text

Install base OS Install packages Conﬁgure the cluster Thursday, April 18, 2013

Slide 16

Slide 16 text

Preseed Chef API Install base OS Install packages Conﬁgure the cluster Thursday, April 18, 2013

Slide 17

Slide 17 text

Hostnames hadooppocnndev001 hadooppocdndev021 hadooppocjtdev001 hadooppocbastiondev001 Thursday, April 18, 2013

Slide 18

Slide 18 text

Hostnames hadoop poc nn dev 001 hadoop poc dn dev 021 hadoop poc jt dev 001 hadoop poc bastion dev 001 Cluster name Node role Environment Node # Thursday, April 18, 2013

Slide 19

Slide 19 text

Build Flow 00:1B:21:2C:C0:E8 PXE dhcp Thursday, April 18, 2013

Slide 20

Slide 20 text

Build Flow - Preﬂight PXE dhcp Asset Tracker Thursday, April 18, 2013

Slide 21

Slide 21 text

Build Flow - Preﬂight Does 00:1B:21:2C:C0:E8 belong to a known asset? Is “kickstart” enabled? What is the hostname of the asset? What OS and version is the host set to use? Get the IP address via DNS. Get the asset ID. PXE dhcp Asset Tracker Thursday, April 18, 2013

Slide 22

Slide 22 text

Build Flow - PXE PXE dhcp Asset Tracker ✓ PXE boot options default Ubuntu12.04-x86_64.vmlinuz append console-setup/layout=us preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw console=ttyS1,19200n8 \ locale=en_US text hostname=x netcfg/dhcp_timeout=120 initrd=Ubuntu12.04-x86_64.initrd.img BOOTIF=01-84-8F-69-FE-F6-5C \ auto=true interface=auto serial nofb Thursday, April 18, 2013

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Build Flow - Preseed 00:1B:21:2C:C0:E8 default Ubuntu12.04-x86_64.vmlinuz append console-setup/layout=us preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw console=ttyS1,19200n8 \ locale=en_US text hostname=x netcfg/dhcp_timeout=120 initrd=Ubuntu12.04-x86_64.initrd.img BOOTIF=01-84-8F-69-FE-F6-5C \ auto=true interface=auto serial nofb preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Thursday, April 18, 2013

Slide 25

Slide 25 text

Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker ‣Perl + Template module to build preseed ﬁle ‣Takes the asset ID as the argument ‣Collect further data from the asset tracker Thursday, April 18, 2013

Slide 26

Slide 26 text

Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw [% IF KickstartEnable AND KickstartEnable != 'Diskless'; %] [% IF OSName AND OSName == 'Ubuntu' AND OSVersion AND (matches = OSVersion.match('12.04')); %] [% IF Model AND Model == 'PowerEdge C6220' AND Name AND (matches = Name.match("hadoop.*(nn|jt).*.sea$")); %] d-i partman-auto/disk string /dev/sda d-i partman-auto/method string regular d-i partman-auto/purge_lvm_from_device boolean true d-i partman-auto/expert_recipe string \ boot-root :: \ 1 1 1 free \ method{ biosgrub } \ . \ 40 50 100 ext3 \ $primary{ } $bootable{ } \ method{ format } format{ } \ use_filesystem{ } \ filesystem{ ext3 } \ mountpoint{ /boot } \ . \ 8000 70 8000 linux-swap \ method{ swap } format{ } \ . \ 500 10000 1000000000 ext4 \ method{ format } format{ } \ use_filesystem{ } \ filesystem{ ext4 } \ mountpoint{ / } \ . Asset Tracker Thursday, April 18, 2013

Slide 27

Slide 27 text

[% ELSIF Model AND Model == 'PowerEdge C6220' AND Name AND ((matches = Name.match("hadoop.*dn.*.sea$"))); %] d-i partman-auto/disk string /dev/sda d-i partman-auto/method string regular d-i partman-auto/purge_lvm_from_device boolean true d-i partman-auto/expert_recipe string \ boot-root :: \ 1 1 1 free \ method{ biosgrub } \ . \ 40 50 100 ext3 \ $primary{ } $bootable{ } \ method{ format } \ format{ } \ use_filesystem{ } \ filesystem{ ext3 } \ mountpoint{ /boot } \ . \ 25600 60 51200 ext4 \ method{ format } \ format{ } \ use_filesystem{ } \ filesystem{ ext4 } \ mountpoint{ / } \ . \ 8000 70 8000 linux-swap \ method{ swap } format{ } \ . \ 500 10000 1000000000 ext4 \ method{ format } format{ } \ use_filesystem{ } \ filesystem{ ext4 } \ mountpoint{ /data/1 } \ options/noatime{ noatime } \ . \ Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker Thursday, April 18, 2013

Slide 28

Slide 28 text

Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker Limitation - can only manage one volume. #!/usr/bin/env bash #Yes this is really dumb and sucks, but you can't do multiple disks that aren't in an lvm according to all the docs we could find. HOSTNAME=`hostname --fqdn` if echo $HOSTNAME | egrep 'hadoop.*dn.*.sea$' || echo $HOSTNAME | egrep 'hadoop.*dev.*.sea'; then USEDDEV=`mount | grep '/boot' | awk '{gsub('/[0-9]/',"");print $1}'` ALLDEV=`ls /dev/sd* | grep "sd[a-z]$"` DEVCNT=ècho "$ALLDEV" | wc -l` DATACNT=2 for sd in ècho "$ALLDEV" | grep -v $USEDDEV`; do echo ';' | sfdisk "$sd" sleep 2 mkfs.ext4 -m1 -O dir_index,extent,sparse_super "$sd"1 echo "${sd}1 /data/$DATACNT ext4 noatime 0 2" >> /etc/fstab # mount each /dev/sd* to /data/# mkdir -p /data/$DATACNT DATACNT=èxpr $DATACNT + 1` done fi Thursday, April 18, 2013

Slide 29

Slide 29 text

chef chef/chef_server_url string http://chef.shopzilla.com:4000/ d-i preseed/late_command string in-target wget -q http://ks.shopzilla.laxhq/ksdone.cgi?id=[% id %];\ in-target sh -c '/usr/bin/curl http://ks.shopzilla.laxhq/chef/chef-configure.sh | sh';\ in-target sh -c '/usr/bin/curl http://ks.shopzilla.laxhq/chef/hadoop-disks.sh | sh' Build Flow - Preseed 00:1B:21:2C:C0:E8 preseed/url=http://ks.shopzilla.laxhq/ubgen.cgi?id=28353&type=hw Asset Tracker #!/usr/bin/env bash #Yes this is really dumb and sucks, but you can't do multiple disks that aren't in an lvm according to all the docs we could find. HOSTNAME=`hostname --fqdn` if echo $HOSTNAME | egrep 'hadoop.*dn.*.sea$' || echo $HOSTNAME | egrep 'hadoop.*dev.*.sea'; then USEDDEV=`mount | grep '/boot' | awk '{gsub('/[0-9]/',"");print $1}'` ALLDEV=`ls /dev/sd* | grep "sd[a-z]$"` DEVCNT=ècho "$ALLDEV" | wc -l` DATACNT=2 for sd in ècho "$ALLDEV" | grep -v $USEDDEV`; do echo ';' | sfdisk "$sd" sleep 2 mkfs.ext4 -m1 -O dir_index,extent,sparse_super "$sd"1 echo "${sd}1 /data/$DATACNT ext4 noatime 0 2" >> /etc/fstab # mount each /dev/sd* to /data/# mkdir -p /data/$DATACNT DATACNT=èxpr $DATACNT + 1` done fi Thursday, April 18, 2013

Slide 30

Slide 30 text

Build Flow - Preseed 00:1B:21:2C:C0:E8 '/usr/bin/curl http://ks.shopzilla.laxhq/chef/chef-configure.sh | sh' #! /bin/sh /etc/init.d/chef-client stop echo > /var/log/chef/client.log set -e mkdir -p /etc/chef /root/.chef_server_url cat > /etc/chef/validation.pem < /root/.chef/kickstart.pem < /root/.chef/knife.rb < '/root/.chef/checksums' ) EOF Install certiﬁcates and conﬁgure the client Thursday, April 18, 2013

Slide 31

Slide 31 text

Build Flow - Preseed 00:1B:21:2C:C0:E8 '/usr/bin/curl http://ks.shopzilla.laxhq/chef/chef-configure.sh | sh' ENV=prod if echo $HOSTNAME | grep -Eq 'stage[0-9]{3}' then ENV=stage fi echo "node_name '$HOSTNAME'" >> /etc/chef/client.rb cat > /etc/chef/node.json <

Slide 32

Slide 32 text

Preseed Chef API Install base OS Install packages Conﬁgure the cluster Thursday, April 18, 2013

Slide 33

Slide 33 text

Preseed Chef API Install base OS Install packages Conﬁgure the cluster ✓ Thursday, April 18, 2013

Slide 34

Slide 34 text

Build Flow - Chef hadooppocdndev021 if node.name.match(/^hadoop.*(dn|nn|jt|bastion).*[0-9]../) hostptrn = node.hostname.gsub(/^hadoop/,"") hostenv = hostptrn.match(/prod[0-9]..|dev[0-9]..|stage[0-9]..|qa[0-9]../)[0].gsub(/[0-9]../,"") hostrole = hostptrn.gsub(/#{hostenv}[0-9]../,"").match(/(jt|nn|dn|bastion)$/)[0] hostnum = hostptrn.match(/[0-9]../)[0] cluster = hostptrn.gsub(/#{hostenv}[0-9]../,"").gsub(/#{hostrole}$/,"") primarynn = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"nn#{hostenv}001") jobtracker = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"jt#{hostenv}001") pribastion = node.name.gsub(/(jt|nn|dn|bastion)(#{hostenv}[0-9]..)/,"bastion#{hostenv}001") hostenv = dev hostrole = dn hostnum = 021 cluster = poc Parse the hostname Thursday, April 18, 2013

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Build Flow - Chef hadooppocdndev021 group "hdfs" do gid 50101 end group "mapred" do gid 50102 end user "hdfs" do uid 50101 gid "hdfs" shell "/bin/bash" home "/var/lib/hdfs" comment "Hadoop HDFS" end user "mapred" do uid 50102 gid "mapred" shell "/bin/bash" home "/var/lib/hadoop-mapreduce" comment "Hadoop MapReduce" end group "hadoop" do gid 50100 members ['hdfs','mapred'] end Create the the hadoop users Thursday, April 18, 2013

Slide 37

Slide 37 text

Install the Hadoop packages Build Flow - Chef hadooppocdndev021 common_packages = [ 'oracle-j2sdk1.6','cloudera-manager-agent','bigtop-utils','bigtop-jsvc','bigtop-tomcat', 'hadoop','hadoop-hdfs','hadoop-httpfs','hadoop-mapreduce','hadoop-client','hadoop-hdfs-fuse', 'hbase','hive','oozie','pig','hue-plugins','hue-common','hue-shell','hue','sqoop','flume-ng', 'hadoop-lzo','lzop','liblzo2-dev','bc' ] nn_packages = [ 'cloudera-manager-server','cloudera-manager-server-db', ] dn_packages = [ ] bastion_packages = [ 'elephantbird','hue-server','hivelibs' ] Thursday, April 18, 2013

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Conﬁgurations Build Flow - Chef hadooppocdndev021 ########## Primary name node only if hostrole == "nn" && hostnum == "001" if Dir['/var/lib/cloudera-scm-server-db/data/*'].empty? execute "scm_db_init" do command "/etc/init.d/cloudera-scm-server-db initdb" end end service "cloudera-scm-server-db" do supports :status => true, :restart => true action [ :enable, :start ] end service "cloudera-scm-server" do supports :status => true, :restart => true action [ :enable, :start ] end end Create the SCM manager DB if it doesn’t exists. Enable & start the DB service. Enable & start the SCM manager. Thursday, April 18, 2013

Slide 41

Slide 41 text

Conﬁgurations Build Flow - Chef hadooppocdndev021 ########## DNs & NNs - Config & start SCM agent & point it to primary name node if hostrole == "dn" || hostrole == "nn" || hostrole == "jt" ruby_block "Update SCM config.ini" do block do cluster_scm = primarynn rc = Chef::Util::FileEdit.new("/etc/cloudera-scm-agent/config.ini") rc.search_file_replace_line(/^server_host=localhost/, "server_host=#{primarynn}") rc.write_file end end service "cloudera-scm-agent" do supports :status => true, :restart => true action [ :enable, :start ] end Update Cloudera SCM agent conﬁg to point to the primary name node. Enable & start the SCM agent service. Thursday, April 18, 2013

Slide 42

Slide 42 text

node['mysql']['server_debian_password'] = "..." node['mysql']['server_repl_password'] = "..." node['mysql']['server_root_password'] = "..." ### Clodera SCM / Hive server settings node['mysql']['tunable']['key_buffer'] = "16M" node['mysql']['tunable']['key_buffer_size'] = "32M" node['mysql']['tunable']['max_allowed_packet'] = "16M" node['mysql']['tunable']['thread_stack'] = "128K" node['mysql']['tunable']['thread_cache_size'] = "64" node['mysql']['tunable']['query_cache_limit'] = "8M" node['mysql']['tunable']['query_cache_size'] = "64M" node['mysql']['tunable']['query_cache_type'] = "1" node['mysql']['tunable']['max_connections'] = "600" node['mysql']['tunable']['read_buffer_size'] = "2M" node['mysql']['tunable']['read_rnd_buffer_size'] = "16M" node['mysql']['tunable']['sort_buffer_size'] = "8M" node['mysql']['tunable']['join_buffer_size'] = "8M" node['mysql']['tunable']['innodb_file_per_table'] = "1" node['mysql']['tunable']['innodb_flush_log_at_trx_commit'] = "2" node['mysql']['tunable']['innodb_log_buffer_size'] = "64M" node['mysql']['tunable']['innodb_buffer_pool_size'] = "2048M" node['mysql']['tunable']['innodb_thread_concurrency'] = "8" node['mysql']['tunable']['innodb_flush_method'] = "O_DIRECT" node['mysql']['tunable']['character-set-server'] = "latin1" node['mysql']['tunable']['collation-server'] = "latin1_swedish_ci" dbconn = {:host => "localhost", :username => 'root', :password => node['mysql']['server_root_password']} MySQL Conﬁgurations Build Flow - Chef hadooppocdndev021 Thursday, April 18, 2013

Slide 43

Slide 43 text

if hostrole == "bastion" if hostnum == "001" hivepwd = "..." mysql_database 'hive' do connection dbconn action :create end mysql_database_user 'hive' do connection dbconn password hivepwd host '%' database_name 'hive' action :grant end end template "/etc/hive/conf/hive-site.xml" do source "hadoop/hive-site.xml.erb" variables({ :pribastion => pribastion, :namenode => primarynn }) end MySQL Conﬁgurations Build Flow - Chef hadooppocdndev021 Thursday, April 18, 2013

Slide 44

Slide 44 text

Build Flow - Chef hadooppocdndev021 template "/etc/hive/conf/hive-site.xml" do source "hadoop/hive-site.xml.erb" variables({ :pribastion => pribastion, :namenode => primarynn }) javax.jdo.option.ConnectionURL jdbc:mysql://<%= @pribastion %>:3306/hive?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore ... Hive Conﬁgurations Thursday, April 18, 2013

Slide 45

Slide 45 text

Preseed Chef API Install base OS Install packages Conﬁgure the cluster ✓ ✓ Thursday, April 18, 2013

Slide 46

Slide 46 text

hadooppocdndev021 • REST API • Basic HTTP authentication • Takes & returns JSON • HTTP: POST - Create entries GET - Read entires PUT - Update entries DELETE - Delete entries Build Flow - API Thursday, April 18, 2013

Slide 47

Slide 47 text

hadooppocdndev021 ruby_block "Hadoop Provision" do block do hostvolumes=`mount | grep '/data/[0-9].' | awk '{print $3}' | sort -V`.split(/\n/).join(',') # Get list of /data mounts result = Net::HTTP.get(URI.parse("http://ks.shopzilla.laxhq/hadoop/hadoopprov.py?\ hostname=#{node.name}&\ cluster=#{cluster}&\ hostnum=#{hostnum}&\ role=#{hostrole}&\ env=#{hostenv}&\ primarynn=#{primarynn}&\ volumes=#{hostvolumes}&\ memory=#{node.memory.total}")) end end Build Flow - API hostenv = dev hostrole = dn hostnum = 021 cluster = poc primarynn = hadooppocnndev001 jobtracker = hadooppocjtdev001 pribastion = hadooppocbastiondev001 Thursday, April 18, 2013

Slide 48

Slide 48 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 Python interface to Cloudera API ! hostname=hadooppocdndev021.shopzilla.sea ! cluster=poc ! hostnum=021 ! role=dn ! env=dev ! primarynn=hadooppocnndev001.shopzilla.sea ! volumes=/data/1,/data/2,/data/3,/data/4 ! memory=67108864kB Build Flow - API Thursday, April 18, 2013

Slide 49

Slide 49 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 Python interface to Cloudera API hadooppocnndev001 ! hostname=hadooppocnndev001.shopzilla.sea ! cluster=poc ! hostnum=001 ! role=nn ! env=dev ! primarynn=hadooppocnndev001.shopzilla.sea ! volumes=/data/1 ! memory=67108864kB Build Flow - API Thursday, April 18, 2013

Slide 50

Slide 50 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 1. Assemble a cluster name - " Cluster" = “POC DEV Cluster” 2. Check if host is checked into SCM. 3. Check if host already has roles assigned. If it does, abort. 4. Get a list of configured clusters from SCM. Is “POC DEV Cluster” one of them? 5. Get a list of configured services from SCM. 6. Pull down our configuration templates from Git. 7. NN - If there is no cluster “POC DEV Cluster”, create it. DN - 1. If there is “POC DEV Cluster”, add host to it. 2. Assign the node to the running services. 3. Calculate map/reduce slots based on host’s RAM & data volumes. Build Flow - API Thursday, April 18, 2013

Slide 51

Slide 51 text

GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/hosts/hadooppocdndev021.shopzilla.sea http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 2. Check if host is checked into SCM. hadooppocdndev021 { ... "hostname" : "hadooppocdndev021.shopzilla.sea", "ipAddress" : "10.101.173.35", "lastHeartbeat" : "2013-04-04T12:52:45.764Z", ... "roleRefs" : [ { "roleName" : "poc-hdfs-021", "serviceName" : "poc-hdfs", "clusterName" : "POC DEV Cluster" }, { "roleName" : "poc-mapred-021", "serviceName" : "poc-mapred", "clusterName" : "POC DEV Cluster" } ] } OR 3. Check if host already has roles assigned. If it does, abort. “roleRefs” should be empty Build Flow - API { "message" : "Host 'hadooppocdndev021.shopzilla.sea' not found." } Thursday, April 18, 2013

Slide 52

Slide 52 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 4. Get a list of conﬁgured clusters from SCM. hadooppocdndev021 { "items" : [ { "name" : "POC DEV Cluster", "version" : "CDH4", "maintenanceMode" : false, "maintenanceOwners" : [ ] } ] } { "items" : [ { "name" : "poc-hdfs", "type" : "HDFS", "displayName" : "poc-hdfs", ... "serviceState" : "STARTED", ... }, { "name" : "poc-mapred", "type" : "MAPREDUCE", ... } ] } 5. Get a list of conﬁgured services from SCM. GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services Build Flow - API Thursday, April 18, 2013

Slide 53

Slide 53 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our conﬁg templates from Git. hadooppocdndev021 Will load poc_hadoop.json, or defaults_hadoop.json if POC does not have its own Build Flow - API default_hadoop.json poc_hadoop.json ods_hadoop.json Thursday, April 18, 2013

Slide 54

Slide 54 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our conﬁg templates from Git. hadooppocdndev021 ! "#$%&'(%)"*+! ++++",-./012/3-4%5)&"*+6 ++++++++! ++++++++++++",-./012/"*+"78089:7;"<+ ++++++++++++"5=/>&"+*6 ++++++++++++++++!+"4?>/"*+"$%&'@?.?4(/'@?4$A5$=#B/,C/("<+"D?.E/"*+FGHGIIJJ++K< ++++++++++++++++!+"4?>/"*+"$%&'$?=?4-$/'$E',/&/,D/$"<+"D?.E/"*+LJMNMHLIGHJ+K< ++++++++++++++++!+"4?>/"*+"$%&'$?=?4-$/'>?O'O(5/D/,&"<+"D?.E/"*+ILPG+K< ++++++++++++++++!+"4?>/"*+"$%&'$?=?'$5,'.5&="<+"D?.E/"*+"Q$?=?QLQ$%&Q$4"+K< ++++++++++++++++!+"4?>/"*+"$?=?4-$/'R?D?'#/?2&5S/"<+"D?.E/"*+GLHMHINTHI+K ++++++++++++U ++++++++K< ++++++++! ++++++++++++",-./012/"*+"98V;9:7;"<+ ++++++++++++"5=/>&"+*6 ++++++++++++++++!+"4?>/"*+"%&'=,?&#'54=/,D?."<+"D?.E/"*+LHHJ+K< ++++++++++++++++!+"4?>/"*+"$%&'4?>/'$5,'.5&="<+"D?.E/"*+"Q$?=?QLQ$%&Q44"+K ++++++++++++U ++++++++K ++++U< ++++"5=/>&"*+6 ++++++++++++!+"4?>/"*+"$%&'@.-(W'&5S/"<+"D?.E/"*+LNHGLMMGI+K< ++++++++++++!+"4?>/"*+"$%&'(.5/4='E&/'$?=?4-$/'#-&=4?>/"<+"D?.E/"*+"=,E/"K< ++++++++++++!+"4?>/"*+"$%&'2/,>5&&5-4&'&E2/,),-E2"<+"D?.E/"*+"&E2/,),-E2"+K ++++U K< HDFS service wide conﬁgurations Build Flow - API Thursday, April 18, 2013

Slide 55

Slide 55 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our conﬁg templates from Git. hadooppocdndev021 ">?2,/$'(%)"*++! ++++",-./012/3-4%5)&"*+6 ++++++++! ++++++++++++",-./012/"*+"X80;Y8Z"< ++++++++++++"5=/>&"*+6 ++++++++++++++++!+"4?>/"*+">?2,/$'(#5.$'R?D?'-2=&'>?O'#/?2"<+"D?.E/"*+GLHMHINTHI+K< ++++++++++++++++!+"4?>/"*+"5-'&-,='>@"<+"D?.E/"*+GFT+K< ++++++++++++++++!+"4?>/"*+">?2,/$'>?2'=?&W&'&2/(E.?=5D/'/O/(E=5-4"<+"D?.E/"*+"%?.&/"+K< ++++++++++++++++!+"4?>/"*+">?2,/$',/$E(/'=?&W&'&2/(E.?=5D/'/O/(E=5-4"<+"D?.E/"*+"%?.&/"+K< ++++++++++++UK< ++++++++! ++++++++++++",-./012/"*+"08C[0\83[;\"< ++++++++++++"5=/>&"*+6 ++++++++++++++++!+"4?>/"*+"=?&W'=,?(W/,'R?D?'#/?2&5S/"<+"D?.E/"*+GLHMHINTHI+K ++++++++++++UK< ++++++++! ++++++++++++",-./012/"*+"]:^0\83[;\"< ++++++++++++"5=/>&"*+6 ++++++++++++++++!+"4?>/"*+"A/@54=/,%?(/'2,5D?=/'?(=5-4&"<+"D?.E/"*+"%?.&/"+K< ++++++++++++++++!+"4?>/"*+">?2,/$'R-@=,?(W/,'=?&WC(#/$E./,"<+"D?.E/"*+"-,)_?2?(#/_#?$--2_>?2,/$_`?5,C(#/$E./,"+K< ++++++++++++++++!+"4?>/"*+"R-@=,?(W/,'>?2,/$'.-(?.'$5,'.5&="<+"D?.E/"+*+"Q$?=?QLQ>?2,/$QR="+K< ++++++++++++++++!+"4?>/"*+">?2,/$'R-@'=,?(W/,'#?4$./,'(-E4="<+"D?.E/"+*+"HI"+K ++++++++++++UK ++++U< ++++"5=/>&"*+6 ++++++++!+"4?>/"*+"#$%&'&/,D5(/"<+"D?.E/"*+"$&a#$%&"+K< ++++++++!+"4?>/"*+"5-'%5./'@E%%/,'&5S/"<+"D?.E/"*+TFFNT+K ++++U++ K< Build Flow - API Map/Reduce service wide conﬁgurations Thursday, April 18, 2013

Slide 56

Slide 56 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 6. Pull down our conﬁg templates from Git. hadooppocdndev021 "&S'2,/%&"*+! ++++++++++++"2,/%&"*+!+ ">?2,/$',?=5-"*+"G*L"<+ ",?>',/&/,D/,')@"*+T< "$/%'&/,D5(/&"*+6"b7`C"<"V8B\;7c3;"<"d::[;;B;\"U<+ "$%&'%?E.='=-./,?4(/'2/,("*+FJ+K ++++K K Build Flow - API In-house preferences Thursday, April 18, 2013

Slide 57

Slide 57 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. NN - If there is no cluster “POC DEV Cluster”, create it using conﬁgs from templates. POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters "sz_prefs": { "def_services": ["HDFS","MAPREDUCE","ZOOKEEPER"] ... load poc_hadoop.json, or defaults_hadoop.json if POC does not have its own { "items" :[{ "name": "POC DEV Cluster", "version": "CDH4", "services": [ { "name": "poc-mapred", "type": "MAPREDUCE", "clusterRef": {"clusterName": "POC DEV Cluster"} },{ "name": "poc-hdfs", "type": "HDFS", "clusterRef": {"clusterName": "POC DEV Cluster"} }, ... ] }] } Build Flow - API Thursday, April 18, 2013

Slide 58

Slide 58 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. NN - If there is no cluster “POC DEV Cluster”, create it using conﬁgs from templates. "hdfs_cfg": { "roleTypeConfigs": [ { "roleType": "DATANODE", "items" :[ { "name": "dfs_balance_bandwidthPerSec", "value": 52428800 }, ] ... } PUT - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs "mapred_cfg": { "roleTypeConfigs": [ { "roleType": "GATEWAY", "items": [ { "name": "mapred_child_java_opts_max_heap", "value": 2147483648 }, ] ... }, }, PUT - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred Repeat for any other services Build Flow - API Thursday, April 18, 2013

Slide 59

Slide 59 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. NN - If there is no cluster “POC DEV Cluster”, create it using conﬁgs from #4 { "items":[{ "type": "NAMENODE", "name": "poc-hdfs-NN-001", "hostRef": {"hostId": "hadooppocnndev001.shopzilla.sea" } }] } POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/roles Build Flow - API Thursday, April 18, 2013

Slide 60

Slide 60 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. DN - If there is “POC DEV Cluster”, add host to it hadooppocdndev021 { "items" : [ { "name" : "POC DEV Cluster", "version" : "CDH4", "maintenanceMode" : false, "maintenanceOwners" : [ ] } ] } { "items" : [ { "name" : "poc-hdfs", "type" : "HDFS", "displayName" : "poc-hdfs", ... "serviceState" : "STARTED", ... }, { "name" : "poc-mapred", "type" : "MAPREDUCE", ... } ] } GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/ GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster Build Flow - API Thursday, April 18, 2013

Slide 61

Slide 61 text

http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev021 hadooppocnndev001 7. DN - If there is a “POC DEV Cluster”, add host to it - HDFS { "items": [{ "type": "DATANODE", "name": “poc-hdfs-021”, "hostRef": {"hostId": "hadooppocdndev021.shopzilla.sea"} }] } POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/roles # hostvolumes=/data/1,/data/2,/data/3,/data/4 # hdfsvolumes = /data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn,/data/4/dfs/dn { "items": [ {"name": "dfs_data_dir_list", "value": hdfsvolumes } ] } PUT - http://hadooppocn..:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/roles/poc-hdfs-021/conﬁg hadooppocdndev021 Build Flow - API Thursday, April 18, 2013

Slide 62

Slide 62 text

PUT - http://hado......:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/roles/poc-mapred-021/conﬁg # volumes=/data/1,/data/2,/data/3,/data/4 # mapredvolumes=/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local,/data/4/mapred/local # memory=67108864kB { "items": [ { "name": "tasktracker_mapred_local_dir_list", "value": mapredvolumes }, { "name": "mapred_tasktracker_map_tasks_maximum", "value": mapslots }, { "name": "mapred_tasktracker_reduce_tasks_maximum", "value": redslots } ] } http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev02 hadooppocnndev001 7. DN - If there is “POC DEV Cluster”, add host to it - MAPREDUCE hadooppocdndev021 { "items": [{ "type": "TASKTRACKER", "name": “poc-mapred-021”, "hostRef": {"hostId": "hadooppocdndev021.shopzilla.sea"} }] } POST - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/roles Build Flow - API Thursday, April 18, 2013

Slide 63

Slide 63 text

PUT - http://hado......:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/roles/poc-mapred-021/conﬁg # volumes=/data/1,/data/2,/data/3,/data/4 # mapredvolumes=/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local,/data/4/mapred/local # memory=67108864kB { "items": [ { "name": "tasktracker_mapred_local_dir_list", "value": mapredvolumes }, { "name": "mapred_tasktracker_map_tasks_maximum", "value": mapslots }, { "name": "mapred_tasktracker_reduce_tasks_maximum", "value": redslots } ] } mapslots and redslots are calculated based on the ratio provided in the conﬁg based on the available RAM. "sz_prefs": { "prefs": { "mapred_ratio": "2:1", "ram_reserve_gb": 6, } } http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocdndev02 hadooppocnndev001 7. DN - If there is “POC DEV Cluster”, add host to it - MAPREDUCE hadooppocdndev021 Build Flow - API Thursday, April 18, 2013

Slide 64

Slide 64 text

hadooppocbastiondev001 http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocnndev001 Client conﬁguration Build Flow - API Thursday, April 18, 2013

Slide 65

Slide 65 text

hadooppocbastiondev001 http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocnndev001 Client conﬁguration GET - http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services { "items" : [ { "name" : "poc-hdfs", "type" : "HDFS", "displayName" : "poc-hdfs", ... "serviceState" : "STARTED", ... }, { "name" : "poc-mapred", "type" : "MAPREDUCE", ... } ] } { "HDFS" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/clientConfig", "MAPREDUCE" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/clientConfig" } Loop over the services & generate conﬁguration links Build Flow - API Thursday, April 18, 2013

Slide 66

Slide 66 text

hadooppocbastiondev001 http://ks.shopzilla.laxhq/hadoop/hadoopprov.py hadooppocnndev001 Client conﬁguration { "HDFS" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/clientConfig", "MAPREDUCE" : "http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-mapred/clientConfig" } Loop over the services & generate conﬁguration links Back in Chef - directory "/tmp/hadoopconf" do action :create end if clientconfigs.key?('HDFS') remote_file "/tmp/hadoopconf/hdfs-config.zip" do source URI.escape(clientconfigs['HDFS']) end end if clientconfigs.key?('MAPREDUCE') remote_file "/tmp/hadoopconf/mapred-config.zip" do source URI.escape(clientconfigs['MAPREDUCE']) end end execute "Copy hadoop configs" do command "unzip -o '/tmp/hadoopconf/*.zip' -d /tmp/hadoopconf/ && mv /tmp/hadoopconf/hadoop-conf/* /etc/hadoop/conf/" action :run end Build Flow - API Thursday, April 18, 2013

Slide 67

Slide 67 text

Preseed Chef API Install base OS Install packages Conﬁgure the cluster ✓ ✓ ✓ Thursday, April 18, 2013

Slide 68

Slide 68 text

API - General http://cloudera.github.io/cm_api/ " Cloudera’s Python library " Complete documentation Configuration samples from your own cluster http://hadooppocnndev001.shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/config http://hadooppo....shopzilla.sea:7180/api/v2/clusters/POC DEV Cluster/services/poc-hdfs/config?view=full " JSON dump of all the configurations you set (non-deafults). " Complete JSON dump of all possible configurations. Thursday, April 18, 2013