Full Stack Kanban - BCS London Lean Kanban Days 2016

It is my experience that the fundamental patterns of Lean Kanban are all pervasive. They apply to managing Software Development, in the network of organisational dependencies and in computer software.

These four guys sitting round a table are building a product, it’s a product to move videos around in the Broadcast Supply Chain. But how do we know how they are doing?

We look for measurable outputs and outcomes; capacity, throughput and cycle time. We also need to know what’s working and not working.

In Hoshin Kanri, to which I was introduced by Marathon Man Karl Scotland’s Lean Kanban Central Europe talk last year, this is called Zone Control. As a CTO having visibility of Zone Control across the organisation is a huge help in understanding its technology needs.

In a similar vein to Hoshin Kanri, Enterprise Service Planning creates mental models of the organisation that help manage WIP, flow and dependencies.

The management of WIP is fundamental to Lean Kanban and something that has become habitual to me as a manager. So I was a little taken aback when Jez Humble, presenting the findings of the 1015 DevOps Report stated that this had found a negligible correlation between WIP and IT Performance.

Why should this be? It turns out that managing WIP can have a positive impact on IT Performance but only when married with effective visualisation and monitoring of applications and infrastructure to inform business decisions.

This fitted with my experience of working with the team around the table to build out this visualisation and monitoring along side management of WIP.

Those four guys are part of an extended team into which we pull in (lean thinking again) the infrastructure expertise of Skelton Thatcher Consulting Limited.

These guys rock.

When we first engaged them a year ago I showed them our software architecture. What was missing from this was an indication of how work flowed through the system. This was a bit of a light bulb moment for me. The model I used to manage software delivery could be applied to the software itself.

We modelled the processes required by our system and since they are capacity constrained resources identified the queues that form upstream of them.

By working with Skelton Thatcher we were then able to create a visualisation of the Queues and Work in progress and use these both to tell us the health of the system but also to inform service and capacity planning.

To do this we started with out Resque Queues. Useful as the Resque Web UI is it doesn’t tell you a lot about throughput, capacity or cycle time.

By connecting Resque to Hosted Graphite via Collectd we were able to create aCumulative Flow Diagram of our system.

A practice that Skelton Thatcher advocate and proposed to us was the use of logging to track the state work as it flowed through the system. This is something the BBC do for iPlayer – as spoken about by Stephen Godwin at this year’s QCon London.

By use of a correlation ID passed along the chain, logging the enqueueing, start and end of work and aggregating these logs in LogEntries we are able with a simple query to get the cycle time for a piece of work. It’s early days for us with this so expect to see some further developments.

Seeing a bottleneck in the cumulative flow diagram in Grafana is not the end of the story. In the example presented here, the root cause of this bottleneck is a hidden dependency, pending I/O requests for an NFS share.

This illustrates the need to look beyond the place where a problem presents itself to find the cause.

A great example of this is told in Steve Maguire’s classic book ‘Debugging the Development Process’ a must read for anyone involved in software in my opinion.

It tells the story of how ‘Bad Intelligence’ held up the development of Word for Windows as a performance problem blamed on an external library turned out to be down to a local optimisation in Word itself. Both sides were instrumenting and testing but neither could isolate the problem.

You need to look beyond the data, work together and be open to the conversations that will help resolve problems.

This is what allowed the four guys round the table to build what they did and for the metrics that came from the instrumentation to be used to inform what the business built next, which as Mary Poppendieck points out, one of the biggest constraints on IT performance.

By building Zone Control into both Software Development and the Software itself – through instrumentation and visualisation we are able to inform good business decisions. The data alone is not enough, we need to be on the look out for gaps in the data and mindful of bad intelligence. By doing so we can make good business decisions.

References:

Turn Your Organisation Into A Laboratory with Strategy Deployment – Karl Scotland at LKCE15 - https://vimeo.com/146075458

Getting the Right Things Done – Pascal Dennis - http://www.lean.org/Bookstore/ProductDetails.cfm?SelectedProductId=156

Enterprise Service Planning - Scaling the Benefits of Kanban (Keynote) – David Anderson at LKCE15 - https://vimeo.com/146524871

Jez Humble - What I learned from three years of sciencing the crap out of Continuous Delivery - PIPELINE Conference 2016 - https://vimeo.com/160945085

2015 State of DevOps Report - https://puppet.com/resources/white-paper/2015-state-of-devops-report

Skelton Thatcher Consulting Ltd - https://skeltonthatcher.com/

Resque - https://github.com/resque/resque

Collectd - https://collectd.org/

collectd-resque - https://github.com/worldofchris/collectd-resque

Hosted Graphite - https://www.hostedgraphite.com/

Grafana - http://grafana.org/

LogEntries - https://logentries.com/

Stephen Godwin Video Factory: Powering BBC iPlayer from the cloud – Qcon London - https://qconlondon.com/2016-video-schedule - Available from 30 May 2016-04-19

Steve Maguire – Debugging the Develpoment Process - http://c2.com/cgi/wiki?DebuggingTheDevelopmentProcess

The aware organization – Mary Poppendieck at LKCE14
- https://vimeo.com/115963280