Developer Productivity Engineering
What’s in it for me?
Trisha Gee
Slide 2
Slide 2 text
⬢ Lead Developer Advocate
⬢ Java Champion
⬢ 20+ years Java experience
⬢ …and author
Trisha Gee
Slide 3
Slide 3 text
https://trishagee.com/books/
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
But Bottlenecks to Productivity are Everywhere
Code
Code
Wait Time for Local Build
Debug Build Failure
Lunch
Code
Wait Time for Local Build
Investigate/Fix Flaky Tests
Sprint
Waiting time for CI Build
Slide 7
Slide 7 text
How developers spend their time
Source: The 2019 Tidelift managed open source survey results https://bit.ly/3MOEpK3
Slide 8
Slide 8 text
“Bottlenecks in the toolchain are holding back the
rockstar 10x developers”
Pete Smoot, Software Architect, Dell Technologies
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
The “best” programmers outperformed
the worst by roughly a 10:1 ratio
Slide 12
Slide 12 text
What Mattered?
Slide 13
Slide 13 text
⬢ Paired programmers performed at roughly the
same level
What Mattered?
Slide 14
Slide 14 text
⬢ Paired programmers performed at roughly the
same level
⬢ They didn’t work together on the task, but they
came from the same organization
What Mattered?
Slide 15
Slide 15 text
⬢ Paired programmers performed at roughly the
same level
⬢ They didn’t work together on the task, but they
came from the same organization
⬢ The best organization performed 11.1x better
than the worst
What Mattered?
Slide 16
Slide 16 text
“While this productivity differential among
programmers is understandable, there is also a 10 to 1
difference in productivity among software
organizations.”
Software Productivity in the Enterprise
Harlan (HD) Mills
https://trace.tennessee.edu/cgi/viewcontent.cgi?article=1010&context=utk_harlan
Slide 17
Slide 17 text
Though the phrase had not yet been coined, increased
productivity came down to developer experience.
Slide 18
Slide 18 text
“The bald fact is that many companies provide
developers with a workplace that is so crowded, noisy,
and interruptive as to fill their days with frustration.
That alone could explain reduced efficiency as well as a
tendency for good people to migrate elsewhere.”
Peopleware: Productive Projects and Teams, Third Edition
Tom DeMarco, Tim Lister
Slide 19
Slide 19 text
Gradle is Pioneering DPE
DPE is a software development
practice used by leading software
development organizations to
maximize developer productivity
and happiness.
Slide 20
Slide 20 text
What Problems Does DPE Solve?
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
DevOps, 12-Factor, Agile, etc, have still not
captured all bottlenecks, friction, and obstacles
to throughput
Many are hiding in plain sight, in the developer
experience itself
Slide 23
Slide 23 text
A 10x organization should be reducing
build and test feedback times, helping
developers troubleshoot problems and
improving the consistency and
reliability of builds
Slide 24
Slide 24 text
Pain Point:
Waiting for Builds &
Tests to Complete
Slide 25
Slide 25 text
Are you tracking local build and test
times?
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
The only initiatives that will positively
impact performance are ones which
increase throughput while
simultaneously decreasing cost
Slide 29
Slide 29 text
Faster Builds Improve Creative Flow
Team 1 Team 2
No. of Devs 11 6
Build Time 4 mins 1 mins
No. of local builds 850 1010
Slide 30
Slide 30 text
Very Fast Feedback Is Important
Slide 31
Slide 31 text
Solution: Acceleration Technologies
Slide 32
Slide 32 text
Build Caching Speeds up Builds and Tests
Slide 33
Slide 33 text
⬢ Introduced to the Java world by Gradle in 2017
⬢ Used by leading technology companies like Google and Facebook
⬢ Can support both user local and remote caching for distributed
teams
Build Caching
Slide 34
Slide 34 text
Build Caching
When the inputs have not changed, the outputs can be reused from a previous run.
Slide 35
Slide 35 text
Demo: Build Cache for Maven
Slide 36
Slide 36 text
Remote Build Cache
⬢ Shared among different machines
⬢ Speeds up development for the whole team
⬢ Reuses build results among CI agents/jobs and individual developers
Slide 37
Slide 37 text
Test Distribution Parallelizes Test Execution
Slide 38
Slide 38 text
Existing solutions: Single machine parallelism
Parallelism in Gradle is controlled by these flags:
--
parallel / org.gradle.parallel
Controls project parallelism, defaults to false
--
max-workers / org.gradle.workers.max
Controls the maximum number of workers, defaults to the number of processors/cores
test.maxParallelForks
Controls how many VMs are forked by an individual test task, defaults to 1
See https://guides.gradle.org/performance/#parallel_execution for more information
Slide 39
Slide 39 text
Existing solutions: CI fanout
See https://builds.gradle.org/project/Gradle for an example of this strategy
Test execution is distributed by manually partitioning the test set and then running partitions in
parallel on several CI nodes.
pipeline {
stage('compile') {
.. .
}
parallelStage('test') {
step {
sh './gradlew :testGroup1'
}
step {
sh './gradlew :testGroup2'
}
step {
sh './gradlew :testGroup3'
}
}
}
Slide 40
Slide 40 text
Assessment of existing solutions
⬢ Build Caching is great in many cases but
doesn’t help when test inputs have changed.
⬢ Single machine parallelism is limited by that
machine’s resources.
⬢ CI fanout does not help during local
development, requires manual setup and test
partitioning, and result collection/aggregation
Slide 41
Slide 41 text
Test Distributor
Slide 42
Slide 42 text
Test Distribution Results
‑ ~50%
‑ ~50%
‑ ~50%
Doubling the number of executors cuts build time in half
Slide 43
Slide 43 text
Netflix reduced a 62-minute test cycle time down to just under 5 minutes!
Slide 44
Slide 44 text
Machine learning leads to greater efficiencies
Slide 45
Slide 45 text
No content
Slide 46
Slide 46 text
Predictive Test Selection
01 Instead of trying to analyze which tests could possibly be impacted by
developer changes, Predictive Test Selection looks at the history of changes
and what has happened to tests in the past
02 When tests complete, they can either FAIL, SUCCEED, or be FLAKY.
Predictive Test Selection will predict the outcome of the test based on the
history it is analyzing
03 PTS will recommend skipping tests that are successful, and will only run tests
that are likely to provide valuable feedback
Slide 47
Slide 47 text
Force multiplier when used in combination
1. Build Cache. Avoid unnecessarily running
components of builds and tests whose inputs
have not changed.
2. Predictive Test Selection. Run only the
relevant subset of test tasks likely to provide
useful feedback.
3. Test Distribution. Speed up the execution
of the necessary and relevant remaining
tests by running them in parallel.
4. Performance Continuity. Sustain Test
Distribution and other performance
improvements over time with data analytic
and performance profiling capabilities.
Slide 48
Slide 48 text
Is the build and test cycle fast enough?
Slide 49
Slide 49 text
Is the build and test cycle fast enough?
Slide 50
Slide 50 text
Is the build and test cycle as fast as it
can possibly be?
Slide 51
Slide 51 text
Pain Point:
Inefficient
troubleshooting of
broken builds
Slide 52
Slide 52 text
“ You can observe a lot by just watching.”
Yogi Berra, Catcher and Philosopher
Blank background use at will
Slide 53
Slide 53 text
Build Scan: scans.gradle.com
Slide 54
Slide 54 text
DPE Organizations Track Failure Rates
Slide 55
Slide 55 text
Pain Point:
Flaky Tests & Other
Avoidable Failures
Slide 56
Slide 56 text
Flaky builds and tests are maddening
Slide 57
Slide 57 text
⬢ Try it again
⬢ Re-run it
⬢ Re-run it again
⬢ Ignore it and approve PR
⬢ All of the above
The test is flaky. What do you do now?
Slide 58
Slide 58 text
“…our analysis revealed that re-running the failing
build and attempting to repair the flaky test were the
most common actions.”
Surveying the Developer Experience of Flaky Tests
https://mcminn.info/publications/c72.pdf
Slide 59
Slide 59 text
“…our analysis revealed that re-running the failing
build and attempting to repair the flaky test were the
most common actions. Our findings also suggested that
developers who experience flaky tests more often are
more likely to take no action in response to them.”
Surveying the Developer Experience of Flaky Tests
https://mcminn.info/publications/c72.pdf
Continuous Improvement: It doesn’t really matter what you
improve as long as you are constantly improving something,
because…
…entropy denotes that if you aren’t doing
anything, you’re always getting worse.
Slide 66
Slide 66 text
“The tools, services, and environments that developers
need to do their jobs should be treated with
production-level SLAs. The development platform is
the production environment for the job of creating
software”
Release It! Second Edition
Michael Nygard
Slide 67
Slide 67 text
Pain Point:
Inefficient use of CI
Resources
Slide 68
Slide 68 text
All Of This Will Improve CI
Body text
Slide 69
Slide 69 text
In Summary
Slide 70
Slide 70 text
⬢ 10x Developers might be a myth, but 10x Organisations are real
In Summary
Slide 71
Slide 71 text
⬢ 10x Developers might be a myth, but 10x Organisations are real
⬢ Developer Productivity is deeply linked to Developer Experience
In Summary
Slide 72
Slide 72 text
⬢ 10x Developers might be a myth, but 10x Organisations are real
⬢ Developer Productivity is deeply linked to Developer Experience
⬢ If you do nothing about productivity, life will get worse
In Summary
Slide 73
Slide 73 text
⬢ 10x Developers might be a myth, but 10x Organisations are real
⬢ Developer Productivity is deeply linked to Developer Experience
⬢ If you do nothing about productivity, life will get worse
⬢ Fast feedback, efficient troubleshooting, and reliable cycles are key
In Summary
Slide 74
Slide 74 text
⬢ 10x Developers might be a myth, but 10x Organisations are real
⬢ Developer Productivity is deeply linked to Developer Experience
⬢ If you do nothing about productivity, life will get worse
⬢ Fast feedback, efficient troubleshooting, and reliable cycles are key
⬢ Start with observation, and then take action on data
In Summary
Slide 75
Slide 75 text
⬢ 10x Developers might be a myth, but 10x Organisations are real
⬢ Developer Productivity is deeply linked to Developer Experience
⬢ If you do nothing about productivity, life will get worse
⬢ Fast feedback, efficient troubleshooting, and reliable cycles are key
⬢ Start with observation, and then take action on data
⬢ Proactively solve problems for the whole team
In Summary
Slide 76
Slide 76 text
No content
Slide 77
Slide 77 text
DPE Transforms Every Business Layer
Slide 78
Slide 78 text
Next Steps
Slide 79
Slide 79 text
https://bit.ly/dpe-4me
Slide 80
Slide 80 text
Thank you!
Slide 81
Slide 81 text
How it works…
1. When a test run starts, the build tool
submits a test input snapshot and test
set to a machine learning model.
2. PTS automatically develops a test
selection strategy by learning from
historical code changes and test
outcomes from your Build Scan data to
predict a subset of relevant tests, which
are then executed by your build.
3. Code change and test results data are
processed immediately after a Build
Scan is uploaded to PTS and updates
the test selection strategy based on new
results.
Slide 82
Slide 82 text
Cache Key/Value Calculation
The cacheKey for Gradle Tasks/Maven Goals is based on the Inputs:
cacheKey(javaCompile) = hash(sourceFiles, jdk version, classpath, compiler args)
The cacheEntry contains the output:
cacheEntry[cacheKey(javaCompile)] = fileTree(classFiles)
For more information, see:
https://docs.gradle.org/current/userguide/build_cache.html