Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro. to No Man's Land

Avatar for Lei Yang Lei Yang
September 26, 2011

Intro. to No Man's Land

Avatar for Lei Yang

Lei Yang

September 26, 2011
Tweet

More Decks by Lei Yang

Other Decks in Technology

Transcript

  1. Before we start... This is a purely technical discussion, don’t

    bring politics in. That is: • Which dept. should in charge? • Why not develop in PHP/Java because nobody else in the company can program in Ruby. • How to integrate NML into XX system? 11年9月27日星期二
  2. Goal Out-of-band Server Management Extremely configurable OS install via SOL(Serial

    Over Lan) An intelligent system to control the whole process, minimum human intervention Build an open-source matrix for Server/OS distro combinations 11年9月27日星期二
  3. Status Member : me,wangjunyan (docs) Subproject Member: lijiehui (LXC: Linux

    container environment) github: https://github.com/op-sdo-com/nml Fork us! 11年9月27日星期二
  4. ipmitool IBM, Dell, HP IBM & Dell are easy to

    deal with. They don’t lay any unnecessary abstraction on top of IPMI. HP closed ipmi port(udp 623) start from iLO2, force customers to use web-based iLO. By upgrading iLO2‘s firmware to 2.06, udp623 is back. Recommandations: Download linux firmware and unpack it, then ssh to your iLO system and issue “cd /map1/ firmware1; load -source http://server_ip/ilo_206.bin” 11年9月27日星期二
  5. Work through 10.132.17.100-150 (prod. IP range) 10.132.17.200-250 (IPMI IP range)

    One-to-One mapping (dynamic IP allocation is just impossible for now, but this can be improved) The current solution is neither secure nor sufficiently isolated. 11年9月27日星期二
  6. Work through 1. Set to boot from PXE then restart:

    ipmitool -I lanplus -U ibm3550 -H 10.132.17.200 -P XX chassis bootdev pxe ipmitool -I lanplus -U ibm3550 -H 10.132.17.200 -P XX chassis power cycle 2. Configure DHCP sever to reply by MAC and refuse any other DHCP request(!!) PS: dhcp3 supports dynamic configuration update via OMAPI. see man dhcpd.conf 11年9月27日星期二
  7. Architecture NML’s encapsulates all the intelligence in HTTP. DHCP and

    iPXE configurations are kept to a minimum. Centralized configuration is easy to maintain. 11年9月27日星期二
  8. Work through host aoti_200 { # eth0, eth1 hardware ethernet

    00:1A:64:99:E7:50; # hardware ethernet 00:1A:64:99:E7:52; fixed-address 10.132.17.109; server-name "10.132.17.108"; if exists user-class and option user-class = "iPXE" { filename "http://10.132.17.108/nml/ipxe"; } else { filename "undionly.kpxe"; } } 11年9月27日星期二
  9. Work through iPXE V.S. PXE iPXE liberate us from TFTP(stupid

    UDP). iPXE supports HTTP(even iSCSI), so the system scales. iPXE lays the foundation to an automatic assessment management platform. 11年9月27日星期二
  10. Work through #!ipxe chain http://nml.snda.com/nml/chain/${manufacturer}/$ {product}/${uuid}?mac=${net0/mac} ${manufacturer}, ${product}, ${uuid}, ${net0/mac}

    are variables exposed by BIOS. Human make mistakes but BIOS are not. PS: This is probably the earliest stage to obtain hardware info. Early == Accurate 11年9月27日星期二
  11. Work through From now on, all the network communication is

    done through HTTP. Also, the intelligence comes in: get '/nml/pxelinux.cfg/:uuid' do uuid = params[:uuid] install(uuid, get_ipaddr(uuid), get_gateway(uuid), get_hostname(uuid), get_iface(uuid), get_baudrate(uuid), get_release(uuid)) end 11年9月27日星期二
  12. Work through def install(uuid, ipaddr, gateway, hostname, iface, baudrate, release)

    indent = ' ' * 4 head = "serial 0 #{baudrate}\ntimeout 50\nlabel pxeboot" tail = "default ubuntu-installer/amd64/boot-screens/vesamenu.c32" kernel = indent + "kernel %s/linux" % [release] # static ip configuration, avoid dhcp in the preseeding stage configs = [ "console-tools/archs=skip-config", "console-keymaps-at/keymap=us", "vga=normal", "netcfg/confirm_static=true", "netcfg/disable_dhcp=true", "netcfg/get_hostname=#{hostname}", "netcfg/get_domain=.nml", "netcfg/get_nameservers=%s" % [@@dns], "netcfg/get_ipaddress=#{ipaddr}", "netcfg/get_netmask=255.255.255.0", "netcfg/get_gateway=#{gateway}", "console=ttyS0,#{baudrate}n8", "interface=#{iface}", "initrd=#{release}/initrd.gz", "auto url=http://%s/%s/preseed/#{uuid}" % [@@master, @@base] ] append = indent + 'append ' + configs.join(' ') + ' -- quiet' [head, kernel, append, tail].join("\n") + "\n" end 11年9月27日星期二
  13. Architecture What’s is preseed? Preseed is kickstart for Debian. Kickstart

    is answers to questions when you manually install a system. Every distro more-or-less provides kickstart like system. 11年9月27日星期二
  14. Architecture NML tries to provide maximum flexibility from the bottom.

    Policy makers decided how to utilize it. Maximum flexibility == Each machine can pull its own configuration set. NML tries hard to be OS/Hardware independent. (Goal 3: build a matrix) 11年9月27日星期二
  15. Architecture I know real world op desperately want consistency, but

    this is policy. NML focus on Mechanism. Why flexibility matters? Any real world examples? 1. Let the system generate distinct password for every machine. I love elegant solution to security. 2. Gain access to partition manager. (ext3, ext4, btrfs and LVM!) 3. Move prelinux script to the preseeding stage ensure a continuous integration of company policy (Lessons: Polices can never be applied without powerful infra.) 4. Automatic network interfaces configuration. Ubuntu installer smartly apply network configuration to /etc/network/interfaces, so does CentOS’s anaconda. 11年9月27日星期二
  16. Architecture Preseed/Kickstart V.S. Image clone • Preseeding is slow. Although

    installer could utilize yum/apt mirror to speed up package downloading, the entire retrieve-prepare-configure cycle can’t be optimized further. • Image clone is suitable for creating VM. But it’s too dumb to do anything intelligence. But we want the best of both world! Solution: n_preseed = normailize(uuid.preseed, uuid.hardware) n_preseed.exists? n_preseed.clone(server_ip, uuid) else install(uuid) 11年9月27日星期二
  17. Architecture Multicast clone: 40 news servers in 10mins. This is

    INSANE. Clonezilla(multicast) + DRBL 11年9月27日星期二
  18. Architecture 1. Yum/Apt mirror ensure 99% cache hit, all the

    packages are pulled from LAN. Local master only maintain cache. 2. Why not directly mirror upstream repo.? 1. The bandwidth of upstream mirror is likely to fluctuate(e.g., us.archieve.ubuntu.com) 2. Most packages will never be downloaded. In fact, the standard installation of CentOS 6.0 only needs less than 380 packages where a full fledged repo contains 15K. (2.5%) 3. Repo. implementations 1. Yum: nginx error_page + proxy_pass + ppull.rb upstream mirror: mirrors.sdo.com (Why not proxy_cache? Because nginx has some issue with range-request when proxy_cache is enabled.) 2. Apt: apt-cacher-ng upstream mirror: mirror.lupaworld.com 11年9月27日星期二
  19. The Matrix Ubuntu 10.04 Ubuntu 11.04 CentOS 5.6 CentOS 6.0

    FreeBSD Gentoo Fedora Debian Arch Linux Windows ? IBM x3550 Y Y Y Y HP Prolian t DL360 G5 IBM x3550 M2 Dell PowerEd ge R410 HP Prolian t DL385 G2 IBM BladeCe nter LS22 • Y means both i386 and amd64 is passed • Y* means M[ij] needs extra configuration 11年9月27日星期二
  20. Architecture 1. Why hardware has dependency on OS distro.? Every

    OS distro. may bring surprise. e.g. Ubuntu-11.04(codename natty)’s radeon card driver is incompatible with IBM x3550. You get kernel panic after installation. 2. What’s the purpose to support all Linux distro.? • We want Total World Domination • NML is about mechanism not policy • Linode supports all distro. on Xen! Our task is easier. 3. Is it time-consuming to support all linux distro.? Just do it. 11年9月27日星期二
  21. TODO 1. Sinatra -> Rails 2. DHCP relay 3. DHCP

    dynamic configuration update 4. merge DRBL’s dhcp config and normal dhcpd.conf to one 11年9月27日星期二