Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Day Of Jenkins 2017. Dealing with agent connectivity issues (Simplified)

Day Of Jenkins 2017. Dealing with agent connectivity issues (Simplified)

This is a simplified version of my Remoting Troubleshooting talk. Original version: https://speakerdeck.com/onenashev/day-of-jenkins-2017-dealing-with-agent-connectivity-issues

Oleg Nenashev

June 01, 2017
Tweet

More Decks by Oleg Nenashev

Other Decks in Programming

Transcript

  1. Dark side of Jenkins. Troubleshooting Remoting issues* Oleg Nenashev CloudBees,

    Inc. Day of Jenkins Oslo, June 01, 2017 * Hardcore Light Edition. More: http://bit.ly/day-of-jenkins-remoting-hardcore
  2. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 2 About me @oleg_nenashev oleg-nenashev LibreCores project St. Petersburg Polytechnic University Jenkins meetups Google Summer of Code
  3. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 3 Oleg’s “Hall of Shame”(c) • Jenkins Core • Windows Service Wrapper • Plugins • Remoting
  4. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 7 • Distributed builds – success factor of Hudson/Jenkins • Remoting – engine under the hood of Jenkins • Agents (FKA “slaves”) are powered by Remoting • In-house implementation What is Remoting?
  5. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 8 üHOW DO AGENTS WORK? üWHAT TO DO IF THEY DO NOT? Agenda Disclaimer: • The presentation represents the speaker’s personal opinion • My opinion may differ from official position of CloudBees or Jenkins Community • Jenkins “agent” and “slave” terms are equivalent
  6. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 9 •Remoting Introduction •Jenkins & Remoting troubleshooting 101 •Improving remoting: low-hanging fruits •What’s HOT in Remoting? Goals
  7. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 10 •Advanced Remoting troubleshooting •Common remoting issues •Platform specifics (Windows, AIX, Docker, etc.) •Code dive No Goals
  8. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 11 Want this sticker? Ask questions!
  9. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 12 Remoting. What does it do? Agent executable (slave.jar) Master communication protocols Classloading Remote I/O Streams Monitoring of agents
  10. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 13 Oleg &vs. Remoting • Dealing with Remoting since 2008 • Maintained own fork at Synopsys • Became Remoting maintainer at CloudBees • Maintain Remoting during working hours • Deal with support escalations
  11. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 14 Does Jenkins run builds on agents?
  12. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 15 Does Jenkins run builds on agents?
  13. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 16 Build in Jenkins Master Agent RPC calls System calls RemoteInputStream/ RemoteOutputStream Missing classes Running build - Context
  14. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 17 Remoting Usage in Jenkins • Master ó Agents • Master ó CLI (Deprecated) • https://jenkins.io/blog/2017/04/11/new-cli/ • Agent ó Maven in Maven Project Plugin • via Maven Interceptors • CloudBees Jenkins Enterprise: • Client master ó CloudBees Jenkins Operations Center
  15. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 18 Remoting protocols • JNLP1 – deprecated protocol • JNLP2 – NIO, no encryption • JNLP3 – no NIO, encrypted, unstable • JNLP4 – NIO, encrypted via TLS • CLI1 – no encryption • CLI2 – encrypted Agents & Maven Project Plugin Jenkins CLI (deprecated remoting mode)
  16. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 19 Protocol configuration (2.19.1+)
  17. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 20 Recommended configuration (Jenkins 2.46.2+) • JNLP1 – deprecated protocol • JNLP2 – NIO, no encryption • JNLP3 – no NIO, encrypted, unstable • JNLP4 – NIO, encrypted via TLS, stable • CLI1 – no encryption • CLI2 – encrypted
  18. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 21 Why JNLP4? • Poor diagnosability when issue happens All • Known issues in connection management • RejectedExecutionEx in ExecutorService kills ALL connections (.../remoting/pull/156 ) JNLP1/JNLP2 • Does not “just work”… • Errata: .../remoting/blob/master/docs/protocols.md - jnlp3-connect-errata JNLP3 • No big ones so far… • Jenkins 2.27+ JNLP4
  19. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 22 JNLP-agent Master JVM Agent JVM HTTP/HTTPS /tcpAgentListener remoting.jar jenkins.war JNLP-protocol • Docker: jenkinsci/jnlp-slave • Swarm Plugin: bundled remoting.jar
  20. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 23 SSH-agent Master JVM Agent JVM SSH Server jenkins.war STDOUT/STDERR • SSH Slaves Plugin • CloudBees NIO SSH Slaves Plugin • Docker: jenkinsci/ssh-slave SSH-connect SSH JRE/JDK remoting.jar settings • Remoting auto-update from master
  21. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 25 My Top-5 of Remoting Issues Depends on TCP Runaway processes in Windows Outdated Remoting No logging by-default Traffic prioritization
  22. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 26 •Networking issues in a custom protocol •Diagnosability is not perfect Remoting issues are tough!
  23. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 27 •Jenkins community •Colleagues •Commercial support Lifehack: Asking for help is fine
  24. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 28 •Jenkins community •Colleagues •Commercial support Lifehack: Asking for help is fine
  25. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 29 Dealing with Remoting issues Check your setup Check Release notes Ask for Help Submit issue to the bugtracker
  26. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 30 Dealing with Remoting issues Check your setup • Jenkins versions, Remoting versions, Java versions Check Release notes Ask for Help Submit issue to the bugtracker
  27. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 31 •Remoting versioning is independent •Remoting version on agent side may differ Remoting version != Jenkins version
  28. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 32 Remoting Versions • Update to the version on master SSH agents • No auto-update till Jenkins 2.50 Windows Service agents • No auto-update JNLP agents • No bugfixes • No new protocols (e.g. JNLP4) • Worse diagnosability • Potential compatibility issues OOTB:
  29. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 33 How to check the Remoting version? • System Information on the agent page • Version Column Plugin: https://plugins.jenkins.io/versioncolumn
  30. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 34 Java versions • Jenkins needs some native libs • IBM Java has known compat issues Vendor • Do NOT use 32-bit Java on 64bit Windows • https://github.com/kohsuke/winp#platform-support • 32-bit Java is bundled in Jenkins Windows Installers L Target platform • Jenkins’ Java requirements apply to agents as well • Otherwise – Undefined behavior Version
  31. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 35 How to monitor Java versions? • Version Column Plugin again! • Since: 2.0-beta-1 (Experimental update center)
  32. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 36 Configuring Java monitors ${JENKINS_URL}/computer/configure Built-in strategies: • Agent JVM version is greater or equal than the Master’s supported one (default) • Agent JVM major.minor version is equal to the Master one (paranoid) • Agent JVM whose is exactly equal to the Master one (paranoid++)
  33. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 37 Dealing with Remoting issues Check your setup Check Release notes Ask for Help Submit issue to the bugrtacker
  34. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 39 https://github.com/jenkinsci/remoting/blob/master/CHANGELOG.md
  35. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 40 Dealing with Remoting issues Check your setup • Remoting versions, Java versions Check Release notes • Search by stacktraces works well Ask for Help • jenkinsci-users, IRC, StackOverflow, etc. Submit issue to the bugtracker • And work with reviewers
  36. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 41 •https://jenkins.io/mailing-lists/ •https://jenkins.io/chat/ •https://stackoverflow.com/questions/tagged/jenkins Community contacts
  37. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 42 https://stackoverflow.com/questions/tagged/jenkins
  38. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 43 Dealing with Remoting issues Check your setup • Remoting versions, Java versions Check Release notes • Search by stacktraces works well Ask for Help • jenkinsci-users, IRC, StackOverflow, etc. Submit issue to the bugtracker • And work with reviewers
  39. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 44 •It is a bugtracker •You should be prepared •Guide: http://bit.ly/jenkins-reporting-issues Jenkins JIRA is NOT a support site
  40. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 46 •Project: JENKINS •(!) SECURITY for security issues •Component: “remoting” •Not sure? Use “_unsorted” Remoting issues
  41. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 47 Submitting JIRA issues. What do we need? Core version Logs ? Stackdumps Remoting version Logs ? Stackdumps Master Agent
  42. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 49 When the Agent fails… Core version Logs ? Stackdumps Remoting version Logs ? Stackdumps Master Agent
  43. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 50 Problem: No logging by default • No logging SSH agents • Logging with logrotate • Collected by Support Core when agent is online Windows Service agents • No logging JNLP agents OOTB:
  44. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 51 Enabling logging in SSH/JNLP agents • Tee STDOUT/STDERR to a file • No Logrotate Support GOOD – “-slaveLog” parameter • Shell-dependent • SSH agents – patch command suffix BAD – STDOUT/STDERR redirect • NOT Documented as well • Some logs go to STDOUT/STDERR NOT UGLY – JUL property file
  45. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 53 What to do NOW? • 2.46+ for Remoting patches • For Windows – LTS 2.60.1+ Keep Jenkins up to date • JNLP agents need to be configured • Logging by default – coming soon Setup agent logging • Adjust network settings • Optimize Remoting throughput Work on infrastructure
  46. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 54 Dealing with network TCP Retransmission timeout • *nix: https://unix.stackexchange.com/questions/210367/changing- the-tcp-rto-value-in-linux • Windows: https://support.microsoft.com/en-us/help/170359/how-to- modify-the-tcp-ip-maximum-retransmission-time-out Network configuration • External monitoring • Independent management- and storage-networks Reducing [peak] throughput
  47. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 55 Reducing Remoting network throughput Master Node Access from UI Storage • Temporary Data • Logs • Artifacts
  48. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 56 Reducing Remoting network throughput Less console logging Persisted JAR cache, esp. in Cloud Nodes External Artifact publishers Pipeline: Local WS instead of stash/unstash Low Hanging Fruits
  49. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 57 Stash() replacement. External Workspace Manager https://github.com/jenkinsci/external-workspace-manager-plugin
  50. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 58 Example https://github.com/jenkinsci/external-workspace-manager-plugin
  51. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 59 What’s next? • Better Diagnosability • Better Stability
  52. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 60 Ongoing changes • Windows services • Work directories in Remoting (JENKINS-44108) • Logging on agents by default (JENKINS-39369) • Documentation •https://github.com/jenkinsci/remoting/
  53. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 61 Windows - New Service Wrapper • Jenkins 2.50+ • For new agents… •Remoting auto-update on agent side •Runaway Process Killer • Update is required •http://bit.ly/jenkins-winsw2-upgrade https://github.com/kohsuke/winsw/
  54. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 62 Work Directories (JENKINS-44108) • Logging by default • Independent JAR Cache • Workspace status checks
  55. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 63 Work Directories (JENKINS-44108) Long adoption… ETA: September LTS
  56. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 65 Moonshots Remoting • TCP-robust Remoting? • Autoupdate of ALL JNLP agents? (JENKINS-44099) • Update of Remoting in Master without the core upgrade? Traffic optimization • External logging • Pluggable storage
  57. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 66 Takeaways Jenkins 2 is not just about Pipelines • Keep updating! Remoting – risk factor in Jenkins • INFRA issues - frequent root cause • You need to design large-scale instances • Remoting can be stabilized Asking for help is fine Contributing is not just about code • Share your experience
  58. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 67 Links • Remoting • Docs: https://github.com/jenkinsci/remoting • Knowledge Base: https://go.cloudbees.com • Changelog: http://bit.ly/remoting-changelog • Windows services: • https://github.com/kohsuke/winsw/
  59. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 68 Thank you! Contacts: E-mail: [email protected] GitHub: oleg-nenashev Twitter: @oleg_nenashev
  60. @oleg_nenashev, #DayOfJenkins - 2017 © 2017 CloudBees, Inc. All Rights

    Reserved. 69 Examples (I have slides!): • My agent hangs and does nothing • I run agents on Windows. Any extra features there? • I see “channel is already closed” in logs. What to do? Contacts: E-mail: [email protected] GitHub: oleg-nenashev Twitter: @oleg_nenashev