Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting along with YAML comments with Psych

Getting along with YAML comments with Psych

psych-comments allows you to manipulate YAML documents without discarding comments. This talk involves how we tried to automate YAML authoring, how we have gone wrong by (ab)using YAML tags for annotations, and how we solved the problem by bringing this library into being. Audiences will get a grasp of YAML's depths and know how a small library helps automation.

Masaki Hara

May 16, 2024
Tweet

More Decks by Masaki Hara

Other Decks in Programming

Transcript

  1. © 2024 Wantedly, Inc. YAML comments  Psych Getting along with

    May. 16 2024 - Masaki Hara @ RubyKaigi 2024 with
  2. © 2024 Wantedly, Inc. Overview • Introducing psych-comments gem •

    Intro to YAML • Implementation • Our use-case • Story on the library’s scope
  3. © 2024 Wantedly, Inc. Example You want this (simplified example)

    env: # build - NPM_TOKEN # deploy - GH_TOKEN env: # build - NPM_TOKEN # deploy - GH_TOKEN - NEW_TOKEN
  4. © 2024 Wantedly, Inc. Example You want this (simplified example)

    env: # build - NPM_TOKEN # deploy - GH_TOKEN env: # build - NPM_TOKEN # deploy - GH_TOKEN - NEW_TOKEN
  5. © 2024 Wantedly, Inc. Example You want this (simplified example)

    env: # build - NPM_TOKEN # deploy - GH_TOKEN env: # build - NPM_TOKEN # deploy - GH_TOKEN - NEW_TOKEN
  6. © 2024 Wantedly, Inc. Psych (YAML) Psych (a.k.a. YAML) obj

    = YAML.load(input) obj["env"] << "GH_TOKEN" output = YAML.dump(obj)
  7. © 2024 Wantedly, Inc. Psych (YAML) Psych (a.k.a. YAML) obj

    = Psych.load(input) obj["env"] << "GH_TOKEN" output = Psych.dump(obj)
  8. © 2024 Wantedly, Inc. Psych (YAML) Psych (a.k.a. YAML) env:

    # build - NPM_TOKEN # deploy - GH_TOKEN --- env: - NPM_TOKEN - GH_TOKEN - NEW_TOKEN
  9. © 2024 Wantedly, Inc. Psych-comments Psych-comments s = Psych::Comments.parse_stream(input) env

    = s.children[0].children[0].children[1] env.children << Psych::Nodes::Scalar.new("NEW_TOKEN") output = Psych::Comments.emit_yaml(s) bundle add psych-comments require "psych/comments"
  10. © 2024 Wantedly, Inc. Psych-comments Psych-comments env: # build -

    NPM_TOKEN # deploy - GH_TOKEN env: # build - NPM_TOKEN # deploy - GH_TOKEN - NEW_TOKEN
  11. © 2024 Wantedly, Inc. YAML family tree XML Perl Marshaller

    YAML 1.0 YAML 1.2 YAML as “simple XML” YAML as “Portable Marshaller” YAML as “JSON upper-compat” JSON 1998 Data::Denter, Around 2001 2004 2001 (Launch of json.org) 2009
  12. © 2024 Wantedly, Inc. YAML family tree XML Perl Marshaller

    YAML 1.0 YAML 1.2 YAML as “simple XML” YAML as “Portable Marshaller” YAML as “JSON upper-compat” JSON 1998 Data::Denter, Around 2001 2004 2001 (Launch of json.org) 2009
  13. © 2024 Wantedly, Inc. YAML and Marshal YAML is aware

    of… • custom objects • aliases • cyclic references • streams
  14. © 2024 Wantedly, Inc. YAML and Marshal YAML and Marshal:

    custom objects YAML.unsafe_load("!ruby/regexp /[a-z]/") Marshal.load( "\x04\x08I/\x0A[a-z]\x00\x06:\x06EF") /[a-z]/
  15. © 2024 Wantedly, Inc. YAML and Marshal YAML and Marshal:

    aliases YAML.unsafe_load("[&x [], *x]") Marshal.load("\x04\x08[\x07[\x00@\x06") [[]] * 2
  16. © 2024 Wantedly, Inc. YAML and Marshal YAML and Marshal:

    cyclic references YAML.unsafe_load("&x [*x]") Marshal.load("\x04\x08[\x06@\x00") [[...]]
  17. © 2024 Wantedly, Inc. YAML and Marshal YAML and Marshal:

    streams "---\n1\n---\n2" # YAML "\x04\x08i\x06\x04\x08i\x07" # Marshal 1 2 NOTE: RGSS save data are one such example
  18. © 2024 Wantedly, Inc. Three-fold YAML processing Ruby Object Node

    Graph Event Tree YAML Representation Serialization Presentation Native
  19. © 2024 Wantedly, Inc. Three-fold YAML processing Ruby Object Node

    Graph Event Tree YAML "&a [1, true, *a]" [...] "1" "true" *a !!seq [...] !!int "1" !!str "true" * [1, true].tap { |a| a << a } Representation Serialization Presentation Native
  20. © 2024 Wantedly, Inc. Three-fold YAML processing Ruby Object Node

    Graph Event Tree YAML Process aliases and anchors (Cycles etc.) Process Tags (Custom Objects) Representation Serialization Presentation Native
  21. © 2024 Wantedly, Inc. YAML Kind Sequence Mapping Scalar [1,

    2, 3] - 1 - 2 - 3 { a: b } a: b ? [1, 2] : [3, 4] foo 1 "2" > foo bar
  22. © 2024 Wantedly, Inc. YAML Kind and Tags Sequence Mapping

    Scalar !!seq !!omap !!pairs !!map !!set !ruby/object !!str !!binary !!null !!bool !!int !!float !!timestamp !!yaml !ruby/symbol !!merge !!value
  23. © 2024 Wantedly, Inc. Tag resolution Default for Sequences, Mappings,

    and quoted Scalars [1, 2, 3] !!seq [1, 2, 3] { a: b } !!map { a: b } "foo" !!str "foo"
  24. © 2024 Wantedly, Inc. Plain scalar resolution Plain scalars, defined

    by a “Schema” null !!null "null" false !!bool "false" 123 !!int "123" 123.45 !!float "123.45"
  25. © 2024 Wantedly, Inc. Plain scalar resolution Schema = Pairs

    of (Regexp, tag) Pattern Tag to be resolved /^(null|Null|NULL|~)?$/ !!null /^(true|True|TRUE|false|False|FALSE)$/ !!bool /^([-+]?[0-9]+|0o[0-7]+|0x[0-9a-fA-F]+)$/ !!int (omit) !!float /^.*$/ !!str
  26. © 2024 Wantedly, Inc. Plain scalar resolution Application-specific schema Pattern

    Tag to be resolved /^(null|Null|NULL|~)?$/ !!null /^(true|True|TRUE|false|False|FALSE)$/ !!bool /^([-+]?[0-9]+|0o[0-7]+|0x[0-9a-fA-F]+)$/ !!int (omit) !!float /^:.*$/ !ruby/symbol /^<<$/ !!merge /^.*$/ !!str
  27. © 2024 Wantedly, Inc. YAML is… • YAML is (in

    a way) a portable Marshal • Abstraction layers to sort out complexity ◦ Parsing: remove syntax details ◦ Deserializing: connect anchors and aliases ◦ Interpreting: resolve and realize tags
  28. © 2024 Wantedly, Inc. Recap: Three-fold YAML processing Ruby Object

    Node Graph Event Tree YAML Representation Serialization Presentation Native
  29. © 2024 Wantedly, Inc. Psych’s API levels Ruby Object Node

    Graph Event Tree YAML Psych.load Psych.dump Psych.parse #to_yaml Psych::Parser Psych::Emitter High-level API Mid-level API Low-level API Representation Serialization Presentation Native
  30. © 2024 Wantedly, Inc. Psych’s API levels and psych-comments Ruby

    Object Node Graph Event Tree YAML Psych.parse #to_yaml Mid-level API psych-comments’ API Representation Serialization Presentation Native
  31. © 2024 Wantedly, Inc. Recap: Psych (YAML) high-level API Recap:

    Psych (a.k.a. YAML): high-level API obj = YAML.load(input) obj["env"] << "GH_TOKEN" output = YAML.dump(obj)
  32. © 2024 Wantedly, Inc. Recap: Psych (YAML) high-level API Recap:

    Psych (a.k.a. YAML): high-level API obj = Psych.load(input) obj["env"] << "GH_TOKEN" output = Psych.dump(obj)
  33. © 2024 Wantedly, Inc. Psych (YAML) mid-level API Psych (a.k.a.

    YAML): mid-level API s = Psych.parse_stream(input) env = s.children[0].children[0].children[1] env.children << Psych::Nodes::Scalar.new("NEW_TOKEN") output = s.to_yaml
  34. © 2024 Wantedly, Inc. Recap: Psych-comments Recap: Psych-comments s =

    Psych::Comments.parse_stream(input) env = s.children[0].children[0].children[1] env.children << Psych::Nodes::Scalar.new("NEW_TOKEN") output = Psych::Comments.emit_yaml(s)
  35. © 2024 Wantedly, Inc. Psych (lack of) extensibility Psych uses

    libyaml (C library) Psych::Nodes::Node libyaml YAML text Extendable from Ruby Not extendable from Ruby
  36. © 2024 Wantedly, Inc. Extending the parser 2-pass parser Psych::Nodes::Node

    libyaml YAML text without comments Source location Psych::Nodes::Node with comments psych- comments Parse nodes Parse comments
  37. © 2024 Wantedly, Inc. Comment scanning algorithm Recurse into all

    descendants and repeat - # egg # pork bar
  38. © 2024 Wantedly, Inc. Comment scanning edge case 1 Edge

    case 1: unwanted occurrence of # foo#bar: < # baz
  39. © 2024 Wantedly, Inc. Comment scanning edge case 1 Edge

    case 1 solution: skip over scalars foo#bar: < # baz
  40. © 2024 Wantedly, Inc. Comment scanning edge case 2 Edge

    case 2: comments before delimiters [ 1, 2, # foo ]
  41. © 2024 Wantedly, Inc. Comment scanning edge case 2 Edge

    case 2 solution: attach as trailing comments [ 1, 2, # foo ]
  42. © 2024 Wantedly, Inc. Comment scanning edge case 3 Edge

    case 3: comments on a key-value pair # foo foo: 1 bar: # bar 2 NOTE: Psych lacks a node type for key-value pairs, instead hanging keys and values alternatingly in a flat array
  43. © 2024 Wantedly, Inc. Comment scanning edge case 3 Edge

    case 3: comments on a key-value pair Mapping Key 0 Val 0 Key 1 Val 1 Key 2 Val 2 … Flat array! (in Psych)
  44. © 2024 Wantedly, Inc. Comment scanning edge case 3 Edge

    case 3 solution: attach them to the key # foo foo: 1 bar: # bar 2
  45. © 2024 Wantedly, Inc. Comment scanning edge case 4 Edge

    case 4: comment on a bullet root: # foo - foo: 1 - # bar bar: 2
  46. © 2024 Wantedly, Inc. Comment scanning edge case 4 Edge

    case 4 solution: attach to the whole element root: # foo - foo: 1 - # bar bar: 2 NOTE: “foo: 1” implicitly generates a Mapping node, which the comment attaches to.
  47. © 2024 Wantedly, Inc. Psych-comments parser Those came down to

    only 100LOC! https://github.com/wantedly/psych-comments/blo b/v0.1.1/lib/psych/comments/parsing.rb
  48. © 2024 Wantedly, Inc. Recap: Psych (lack of) extensibility Psych

    uses libyaml (C library) Psych::Nodes::Node libyaml YAML text Extendable from Ruby Not extendable from Ruby
  49. © 2024 Wantedly, Inc. Extending the generator Just reimplement the

    generator 💪 (except for scalars) Psych::Nodes::Node YAML text psych- comments Collections libyaml Scalars Delegate
  50. © 2024 Wantedly, Inc. Internal commands for YAML formatting Prepared

    utilities for formatting print "foo" space! newline! indented do … end Print after generating reserved spaces and indentation Reserve spaces or indentation Bump indent level NOTE: there is one more “virtual indentation” util for bullets
  51. © 2024 Wantedly, Inc. Internal commands for YAML formatting newline!

    reserves indentation - foo: bar - baz - foo: bar bar: baz
  52. © 2024 Wantedly, Inc. Indentation Adjust Sequence in mapping is

    special - - baz - bar: baz foo: - baz foo: bar: baz 4 4 2 4
  53. © 2024 Wantedly, Inc. Generating bullet comments Hoisting map comments

    above bullets root: # foo - foo: 1 - # bar bar: 2
  54. © 2024 Wantedly, Inc. Generating bullet comments Hoisting map comments

    above bullets → • Lookahead the tree and generate comments • Then avoid duplication via a queue ◦ Note that we should not mutate the input
  55. © 2024 Wantedly, Inc. Psych-comments generator Reimplementation took only 300LOC

    💪💪💪 https://github.com/wantedly/psych-comments/blo b/v0.1.1/lib/psych/comments/emitter.rb
  56. © 2024 Wantedly, Inc. config/locales en: our_new_service: try_if_out: "Try it

    out" ja: our_new_service: try_it_out: "試してみる"
  57. © 2024 Wantedly, Inc. config/locales: translation mistakes Problem 1: mistakes

    and oversight en: our_new_service: try_if_out: "Try it out" ja: our_new_service: try_it_out: "試してみる"
  58. © 2024 Wantedly, Inc. config/locales: translation mistakes Solution 1: check

    for correspondence Synchronizing locales... Generating en.our_new_service.try_it_out Generating ja.our_new_service.try_if_out
  59. © 2024 Wantedly, Inc. config/locales: translation mistakes Solution 1: and

    generate boilerplates en: our_new_service: try_it_out: !todo "試してみる" ja: our_new_service: try_it_out: "試してみる"
  60. © 2024 Wantedly, Inc. config/locales: intentional absense Problem 2: intentionally

    limit languages to support en: # No data due to # translation # cost ja: japan_only: start: "開始"
  61. © 2024 Wantedly, Inc. config/locales: intentional absense Solution 2: (ab)use

    YAML tags en: # No data due to # translation # cost ja: japan_only: start: !only:ja "開始" cf. https://github.com/creasty/i18n_flow
  62. © 2024 Wantedly, Inc. config/locales: tag syntax error Problem 3:

    roundtrip failure !only:en,ja !<only:en,ja> YAML 1.1 / libyaml 0.2.4 YAML 1.2 / libyaml 0.2.5 ✅ ✅ ✅ ❌ cf. https://github.com/yaml/libyaml/pull/179
  63. © 2024 Wantedly, Inc. config/locales: tag syntax error Problem 3:

    roundtrip failure !only:en,ja !<only:en,ja> YAML 1.1 / libyaml 0.2.4 YAML 1.2 / libyaml 0.2.5 ✅ ✅ ✅ ❌ Libyaml still generates this!
  64. © 2024 Wantedly, Inc. config/locales: tag syntax error Problem 3’s

    root cause …… tag abuse !ruby/symbol foo !!omap [foo: 1, bar: 2] !!set { foo, bar } Legitimate tagging examples
  65. © 2024 Wantedly, Inc. config/locales: tag syntax error Solution 3:

    use comments for tags 👍 en: # No data due to # translation # cost ja: japan_only: # i18n:only:ja start: "開始"
  66. © 2024 Wantedly, Inc. config/locales: tag syntax error Solution 3:

    use comments for tags 👍 en: # No data due to # translation # cost ja: japan_only: # i18n:only:ja start: "開始"
  67. © 2024 Wantedly, Inc. config/locales: resolution Problem 4: Psych lacks

    comment support → Solution 4: implement it myself 💪💪💪💪
  68. © 2024 Wantedly, Inc. Comment position Psych-comments (0.1.1) comment position

    # foo foo # baz leading comments trailing comments
  69. © 2024 Wantedly, Inc. Comment position Then line-end comments #

    foo foo # bar # baz leading comments trailing comments line-end comments
  70. © 2024 Wantedly, Inc. Comment positioning problem Comment positioning problem

    - 1 # foo - 12 # bar - 1 # foo - 12 # bar - 1 # foo - 12 # bar Psych:: Nodes:: Node
  71. © 2024 Wantedly, Inc. Scopes There is no end to

    spacing details foo: - 1 - 2 foo: - 1 - 2 - 1 - 12 - 1 - 12 # foo - a: b # foo - a: b foo: - 1 - 2
  72. © 2024 Wantedly, Inc. Scopes There is no end to

    feature requests https://prettier.io/docs/en/option-philosophy
  73. © 2024 Wantedly, Inc. Recap: Three-fold YAML processing Ruby Object

    Node Graph Event Tree YAML "&a [1, true, *a]" [...] "1" "true" *a !!seq [...] !!int "1" !!str "true" * [1, true].tap { |a| a << a } Representation Serialization Presentation Native
  74. © 2024 Wantedly, Inc. Levels of abstraction Presentation Serialization Representation

    Anchors & aliases Non-specific tags Scalar content formatting Directives Node style Comments Spacing Key ordering Tag style Escapes Tags Node links Scalar content
  75. © 2024 Wantedly, Inc. Levels of abstraction – Psych Presentation

    Serialization Representation Anchors & aliases Non-specific tags Scalar content formatting Directives Node style Comments Spacing Key ordering Tag style Escapes Tags Node links Scalar content Psych Mid-level API
  76. © 2024 Wantedly, Inc. Levels of abstraction – Psych Presentation

    Serialization Representation Anchors & aliases Non-specific tags Scalar content formatting Directives Node style Comments Spacing Key ordering Tag style Escapes Tags Node links Scalar content Psych-comments
  77. © 2024 Wantedly, Inc. Recap: Psych-comments layer Presentation Serialization Representation

    Anchors & aliases Non-specific tags Scalar content formatting Directives Node style Comments Spacing Key ordering Tag style Escapes Tags Node links Scalar content Psych-comments
  78. © 2024 Wantedly, Inc. Wrap up • I made psych-comments

    gem. • It processes YAML comments. • It neatly solves your problem, partly reusing Psych’s own algorithms. • Thankfully people are interested in expanding it, but as a responsible maintainer, I’m going to limit its scope.