Google Cloud Platform How to convert strings (and how not to) AKA: escaping is harder than it seems Tim Hockin Principal Software Engineer, Google @thockin
Google Cloud Platform Problem statement: You have some input string that you need to store in some other system which has different rules for contents.
Google Cloud Platform Where have I seen this before? Most people experience this in printf() style formatting. “hello\n” => “hello” + newline “hello\\n” => “hello\n”
Google Cloud Platform Seriously? That’s gross. Yeah. And it gets worse. If you have multiple characters to handle or double-dash isn’t allowed, you need a different encoding. foo.bar_baz => foo-dot-bar-usc-baz
Google Cloud Platform Now you have to escape the escapes You have to handle user inputs that use your escape. foo-dot-bar-usc-baz => foo-esc-dot-bar-esc-usc-baz
Google Cloud Platform Naive solution: Drop characters over the limit. foo-esc-dot-bar10 => foo-esc-esc-dot-bar1 foo-esc-dot-bar11 => foo-esc-esc-dot-bar1
Google Cloud Platform Where have I seen this before? Remember Windows95? FAT32 encoded long filenames like this. “long_file_name.txt” => “long_f~1.txt” “long_for_home.txt” => ”long_f~2.txt”
Google Cloud Platform Can “within set” apply to the other cases? Sure, if deterministic values don’t matter. The naive solutions are OK if you KNOW there isn’t a conflict.
Google Cloud Platform Take-aways: 1. Transcoding strings seems simple but it isn’t. 2. Always consider the encoding when crossing APIs. 3. Consider how strange input might break your code. Handle it.