How people actually write Puppet code

How people actually write Puppet code

An analysis of more than 7.5 million lines of Puppet code. Looking at what features people use (or don't use) and how we might use this to design Puppet and help the Puppet community. Warning: contains lots of SQL queries .

98234c645fe8c935edc0fec0186d28b8?s=128

Gareth Rushgrove

October 12, 2017
Tweet

Transcript

  1. Gareth Rushgrove How people actually write Puppet Analyzing 7.5 million

    lines of Puppet code
  2. None
  3. @garethr

  4. What if we were to analyze all of the code

    written in the Puppet language?
  5. - But why? - GitHub and BigQuery - Libraries.io and

    the Forge - Caveats and limitations
  6. WARNING This presentation contains far too many SQL queries with

    embedded regex
  7. But why? Why analyzing code is useful

  8. Making design decisions

  9. What features could/should be removed?

  10. Identify common problems which can be addressed with additional tooling

  11. See how and where new features are adopted by users

  12. Identify contributors for more qualitative research

  13. GitHub and BigQuery Our main source of data

  14. Open source code on GitHub available in BigQuery

  15. BigQuery provides a SQL interface to all of the code

    on GitHub, along with the associated metadata
  16. puppetlabs/puppet-bigquery

  17. Repository contains all the queries used in this this talk

  18. How many bytes of Puppet code?

  19. How many lines of Puppet code?

  20. In how many files?

  21. From how many contributors?

  22. So we have lots of code from lots of people.

    What questions might we ask?
  23. How do people license their Puppet code?

  24. How do people license their Puppet code? mit(7345) apache-2.0 (6368)

    gpl-3.0 (1537) gpl-2.0 (1512) bsd-3-clause (805) bsd-2-clause (373) mpl-2.0 (287) agpl-3.0 (160) isc (121) unlicense (105)
  25. Mainly permissive licenses, but nearly 15% is GPL

  26. What do people name their classes? apache(153) mysql (131) php

    (129) base (118) main (118) nginx (110) git (84) python (82) mysql::params (69) ssh (64) mysql::server (59) apt (59) puppet::params (56) mysql::config (55) nodejs (53) apt_get_update (52)
  27. What do people name defined types? add_dotdeb (75) mysql_db (68)

    mysql_nginx_default_conf (57) mongodb_db(52) nginx_vhost (49) postgresql_db (40) mariadb_db (39) mariabdb_nginx_default_conf (38) safepackage(31) iptables_port (26)
  28. What packages do people manage with Puppet? git (552) curl

    (386) wget (239) nginx(246) apache2 (244) vim (243) unzip (238) build-essential (197) python-pip (185) mysql (184) ntp (176) openssh-server (156) nodejs (152) httpd (138)
  29. What services do people manage with Puppet? apache2 (314) nginx

    (280) mysql (239) httpd (165) puppet (125) iptables (119) ssh (105) postgresql (93) php5-fpm (88) neutron-server (81) ntp (76) postfix (74) ntpd (71) sshd (70) mysqld (69)
  30. These give a good indication of the most common stacks

    managed with Puppet
  31. - Apache and Nginx - PHP, Python and Node.js -

    MySQL and PostgreSQL
  32. It would be interesting to look at this over time

    too
  33. Who committed most of this code?

  34. Popular file names containing Puppet code

  35. Simple Puppet Module Structure in 2009

  36. Evidence that install, config, service pattern became the defacto way

    of organising code
  37. What types are used the most?

  38. Which resource types are most used? file (30298) package (22162)

    exec (16825) service (11112) user (3951) host (2361) group (2181) notify (2151) yumrepo (1229) cron (1122) stage (429) resources (380) mount (373) ssh_authorized_key (271)
  39. Which resource types are least used? interface (260) zone (207)

    selboolean (108) router (103) tidy (76) sshkey (66) schedule (42) filebucket (42) mailalias (40) vlan (31) selmodule (25) zfs (11) mcx (6) scheduled_task (5) zpool (3) k5login (3) computer (2) maillist (1)
  40. More than 50% of resources are file and package

  41. exec accounts for ~18% of resources

  42. A very long tail. maillist was used once in 7.5

    million lines
  43. How popular are the nagios types? service (160) command (56)

    host (49) servicedependency (35) contact (18) servicegroup (10) hostextinfo (8) timeperiod (5) hostdependency (3) serviceescalation (2) hostescalation (1)
  44. Which new features are used?

  45. Which data type are being used? String (603) Boolean (411)

    Integer (174) Hash (132) Array (82) Default (14) Float (5) Numeric (3) Undef (1)
  46. Which abstract data types are in use? Optional(611) Enum (277)

    Data (162) Type (145) Variant (134) All (127) Patter (73) Tuple (13) Collection (11) Struct (11) Scaler (2)
  47. How many repositories container Gemfiles?

  48. What gems are popular in Puppet projects?

  49. What gems are popular in Puppet projects? puppet (1285) puppetlabs_spec_helper

    (1268) rake (1215) puppet-lint (837) metadata-json-lint (726) beaker-rspec (694) rspec-puppet (676) puppet-blacksmith (518) beaker (509) serverspec (444)
  50. Which puppet-lint plugins are most used? unquoted_string (366) leading_zero (344)

    absolute_classname (318) trailing_comma (310) version_comparison (248) variable_contains_uppercase (222) beginning_with_digits (191) emtyp_string (156) undef_in_function (147) spaceship_operatorwithout_tag (135)
  51. Other sources of data Libraries.io, the Forge API and more

  52. Different systems often have different views on the same object

  53. Libraries.io

  54. Puppet Forge API

  55. forgeapi.puppetlabs.com/v3/modules { "uri": "/v3/modules/arioch-keepalived", "slug": "arioch-keepalived", "name": "keepalived", "downloads": 5551609,

    "created_at": "2013-07-02 09:13:33 -0700", "updated_at": "2016-12-28 20:00:02 -0800", "deprecated_at": null, "deprecated_for": null, "superseded_by": null, "supported": false, "endorsement": "approved", "module_group": "base", "owner": { "uri": "/v3/users/arioch", "slug": "arioch", "username": "arioch",
  56. Puppet version dependencies

  57. Long tail of specific Puppet version requirements

  58. What licenses are popular for Forge modules?

  59. Interesting to note that while most Puppet repositories are MIT,

    most published modules are Apache licensed
  60. Caveats Limitations of the data

  61. Obviously this is a subset of all Puppet code

  62. Software has bugs. Including this software.

  63. How does private Puppet code vary from this dataset?

  64. Conclusions If all you remember is...

  65. The ability to ask questions of a large data set

    is a useful design tool
  66. For example, we could use this approach to see how

    people adopt the new Puppet Tasks
  67. - Analyzing hieradata - Parsing all of the Puppet code

    - Seeing changes over time So many more questions:
  68. - Make it easy to run against your own code

    - Some way of submitting aggregates - Publish public bigquery tables Lots of ideas too:
  69. Any questions? And thanks for listening