$30 off During Our Annual Pro Sale. View Details »

How people actually write Puppet code

How people actually write Puppet code

An analysis of more than 7.5 million lines of Puppet code. Looking at what features people use (or don't use) and how we might use this to design Puppet and help the Puppet community. Warning: contains lots of SQL queries .

Gareth Rushgrove

October 12, 2017
Tweet

More Decks by Gareth Rushgrove

Other Decks in Technology

Transcript

  1. Gareth Rushgrove
    How people actually
    write Puppet
    Analyzing 7.5 million lines of Puppet code

    View Slide

  2. View Slide

  3. @garethr

    View Slide

  4. What if we were to analyze
    all of the code written in the
    Puppet language?

    View Slide

  5. - But why?
    - GitHub and BigQuery
    - Libraries.io and the Forge
    - Caveats and limitations

    View Slide

  6. WARNING
    This presentation contains
    far too many SQL queries
    with embedded regex

    View Slide

  7. But why?
    Why analyzing code is useful

    View Slide

  8. Making design decisions

    View Slide

  9. What features could/should
    be removed?

    View Slide

  10. Identify common problems
    which can be addressed
    with additional tooling

    View Slide

  11. See how and where new
    features are adopted by users

    View Slide

  12. Identify contributors for more
    qualitative research

    View Slide

  13. GitHub and BigQuery
    Our main source of data

    View Slide

  14. Open source code on GitHub available in BigQuery

    View Slide

  15. BigQuery provides a SQL
    interface to all of the code
    on GitHub, along with the
    associated metadata

    View Slide

  16. puppetlabs/puppet-bigquery

    View Slide

  17. Repository contains all the
    queries used in this this talk

    View Slide

  18. How many bytes of Puppet code?

    View Slide

  19. How many lines of Puppet code?

    View Slide

  20. In how many files?

    View Slide

  21. From how many contributors?

    View Slide

  22. So we have lots of code
    from lots of people. What
    questions might we ask?

    View Slide

  23. How do people license their Puppet code?

    View Slide

  24. How do people license their Puppet code?
    mit(7345) apache-2.0 (6368)
    gpl-3.0 (1537) gpl-2.0 (1512)
    bsd-3-clause (805) bsd-2-clause (373)
    mpl-2.0 (287) agpl-3.0 (160) isc (121)
    unlicense (105)

    View Slide

  25. Mainly permissive licenses,
    but nearly 15% is GPL

    View Slide

  26. What do people name their classes?
    apache(153) mysql (131) php (129)
    base (118) main (118) nginx (110) git
    (84) python (82) mysql::params (69)
    ssh (64) mysql::server (59) apt (59)
    puppet::params (56) mysql::config
    (55) nodejs (53) apt_get_update (52)

    View Slide

  27. What do people name defined types?
    add_dotdeb (75) mysql_db (68)
    mysql_nginx_default_conf (57)
    mongodb_db(52) nginx_vhost (49)
    postgresql_db (40) mariadb_db (39)
    mariabdb_nginx_default_conf (38)
    safepackage(31) iptables_port (26)

    View Slide

  28. What packages do people manage with Puppet?
    git (552) curl (386) wget (239)
    nginx(246) apache2 (244) vim (243)
    unzip (238) build-essential (197)
    python-pip (185) mysql (184)
    ntp (176) openssh-server (156) nodejs
    (152) httpd (138)

    View Slide

  29. What services do people manage with Puppet?
    apache2 (314) nginx (280) mysql (239)
    httpd (165) puppet (125) iptables (119)
    ssh (105) postgresql (93) php5-fpm
    (88) neutron-server (81) ntp (76)
    postfix (74) ntpd (71) sshd (70)
    mysqld (69)

    View Slide

  30. These give a good indication of
    the most common stacks
    managed with Puppet

    View Slide

  31. - Apache and Nginx
    - PHP, Python and Node.js
    - MySQL and PostgreSQL

    View Slide

  32. It would be interesting to look at
    this over time too

    View Slide

  33. Who committed most of this code?

    View Slide

  34. Popular file names containing Puppet code

    View Slide

  35. Simple Puppet Module Structure in 2009

    View Slide

  36. Evidence that install, config,
    service pattern became the
    defacto way of organising code

    View Slide

  37. What types are used the most?

    View Slide

  38. Which resource types are most used?
    file (30298) package (22162)
    exec (16825) service (11112)
    user (3951) host (2361) group (2181)
    notify (2151) yumrepo (1229) cron
    (1122) stage (429) resources (380)
    mount (373) ssh_authorized_key (271)

    View Slide

  39. Which resource types are least used?
    interface (260) zone (207)
    selboolean (108) router (103) tidy
    (76) sshkey (66) schedule (42)
    filebucket (42) mailalias (40) vlan
    (31) selmodule (25) zfs (11) mcx (6)
    scheduled_task (5) zpool (3)
    k5login (3) computer (2) maillist (1)

    View Slide

  40. More than 50% of resources are
    file and package

    View Slide

  41. exec accounts for ~18% of
    resources

    View Slide

  42. A very long tail. maillist was
    used once in 7.5 million lines

    View Slide

  43. How popular are the nagios types?
    service (160) command (56)
    host (49) servicedependency (35)
    contact (18) servicegroup (10)
    hostextinfo (8) timeperiod (5)
    hostdependency (3) serviceescalation
    (2) hostescalation (1)

    View Slide

  44. Which new features are used?

    View Slide

  45. Which data type are being used?
    String (603) Boolean (411)
    Integer (174) Hash (132) Array (82)
    Default (14) Float (5) Numeric (3)
    Undef (1)

    View Slide

  46. Which abstract data types are in use?
    Optional(611) Enum (277)
    Data (162) Type (145) Variant (134)
    All (127) Patter (73) Tuple (13)
    Collection (11) Struct (11) Scaler (2)

    View Slide

  47. How many repositories container Gemfiles?

    View Slide

  48. What gems are popular in Puppet projects?

    View Slide

  49. What gems are popular in Puppet projects?
    puppet (1285) puppetlabs_spec_helper
    (1268) rake (1215) puppet-lint (837)
    metadata-json-lint (726) beaker-rspec
    (694) rspec-puppet (676)
    puppet-blacksmith (518) beaker (509)
    serverspec (444)

    View Slide

  50. Which puppet-lint plugins are most used?
    unquoted_string (366) leading_zero (344)
    absolute_classname (318) trailing_comma (310)
    version_comparison (248)
    variable_contains_uppercase (222)
    beginning_with_digits (191) emtyp_string (156)
    undef_in_function (147)
    spaceship_operatorwithout_tag (135)

    View Slide

  51. Other sources of data
    Libraries.io, the Forge API and more

    View Slide

  52. Different systems often
    have different views on
    the same object

    View Slide

  53. Libraries.io

    View Slide

  54. Puppet Forge API

    View Slide

  55. forgeapi.puppetlabs.com/v3/modules
    {
    "uri": "/v3/modules/arioch-keepalived",
    "slug": "arioch-keepalived",
    "name": "keepalived",
    "downloads": 5551609,
    "created_at": "2013-07-02 09:13:33 -0700",
    "updated_at": "2016-12-28 20:00:02 -0800",
    "deprecated_at": null,
    "deprecated_for": null,
    "superseded_by": null,
    "supported": false,
    "endorsement": "approved",
    "module_group": "base",
    "owner": {
    "uri": "/v3/users/arioch",
    "slug": "arioch",
    "username": "arioch",

    View Slide

  56. Puppet version dependencies

    View Slide

  57. Long tail of specific Puppet
    version requirements

    View Slide

  58. What licenses are popular for Forge modules?

    View Slide

  59. Interesting to note that while
    most Puppet repositories are
    MIT, most published modules
    are Apache licensed

    View Slide

  60. Caveats
    Limitations of the data

    View Slide

  61. Obviously this is a subset of
    all Puppet code

    View Slide

  62. Software has bugs.
    Including this software.

    View Slide

  63. How does private Puppet code
    vary from this dataset?

    View Slide

  64. Conclusions
    If all you remember is...

    View Slide

  65. The ability to ask questions
    of a large data set is a
    useful design tool

    View Slide

  66. For example, we could use
    this approach to see how
    people adopt the new
    Puppet Tasks

    View Slide

  67. - Analyzing hieradata
    - Parsing all of the Puppet code
    - Seeing changes over time
    So many more questions:

    View Slide

  68. - Make it easy to run against your own code
    - Some way of submitting aggregates
    - Publish public bigquery tables
    Lots of ideas too:

    View Slide

  69. Any questions?
    And thanks for listening

    View Slide