Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to hack with pack and unpack

How to hack with pack and unpack

Digging deeply into one of perl's dusty corners.

David Lowe

May 09, 2012
Tweet

More Decks by David Lowe

Other Decks in Programming

Transcript

  1. what’s pack? • like sprintf() • only for bytes, not

    for presentation • template rules very complex • DWIM: packs empty strings for missing arguments 4 Tuesday, February 28, 2012
  2. what’s unpack? • like sscanf() for bytes • (not really:

    we’ll come back to it) • (mostly) identical template rules to pack • dies if it runs out of input bytes 5 Tuesday, February 28, 2012
  3. • perldoc -f pack • perldoc -f unpack • perldoc

    perlpacktut 6 my $fourbytes = pack ‘L’, 12; my $twelve = unpack ‘L’, $fourbytes; Tuesday, February 28, 2012
  4. but seriously... • struct alignment issues will ruin your day

    • change the XS and/or C if you can • Convert::Binary::C to the rescue! 11 Tuesday, February 28, 2012
  5. syscall • I have never had to do this... •

    perldoc -f syscall 12 Tuesday, February 28, 2012
  6. “close to the metal” • network protocols • binary file

    formats • bytes are language neutral 13 Tuesday, February 28, 2012
  7. • no sscanf() in perl • substr or regexes... •

    unpack is a bit nicer (not much) 15 Tuesday, February 28, 2012
  8. example: contrived pie 16 chess 8 pecan 7 shaker lemon4

    shoo fly 10 $pie = substr $_, 0, 12; $deliciousness = substr $_, 12; ($pie, $deliciousness) = m/(.{12})(.*)/; ($pie, $deliciousness) = unpack 'A12 A*', $_; • not quite identical... Tuesday, February 28, 2012
  9. • vec(): treat a scalar as an arbitrary length bit

    vector • (you’re not using numbers, are you?) • pack and unpack ‘b’ template is perfect for working with the vector as a whole • convert vectors to and from from strings “011100” or lists (0,1,1,1,0,0) • count bits with unpack checksum • perldoc -f vec 19 Tuesday, February 28, 2012
  10. example: one million bits! 20 ## create a 125,001 byte

    vector my $bit_vector = ''; (vec $bit_vector, 1_000_000, 1) = 1; ## stringify: “00000...1” my $bits = unpack 'b*', $bit_vector; ## listify: (0,0,0,...,1) my @bits = split //, unpack 'b*', $bit_vector; ## how many bits are on? my $on_bits = unpack '%32b*', $bit_vector; • the 1000001st through 1000008th bits are free! Tuesday, February 28, 2012
  11. • (or 4-argument substr) • magic: no realloc iff replacement

    length == original length • sprintf also might work, depending... 22 Tuesday, February 28, 2012
  12. example: Sys::Mmap 23 mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die

    $!; $shared = meaning_of_life(); munmap($shared); mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; (substr $shared, 0, 4) = pack ‘L’, meaning_of_life(); munmap($shared); • 7.5 million years’ work down the tubes! Tuesday, February 28, 2012
  13. use bytes • binary data + DWIM + unicode •

    ouch! • pragma to the rescue: “No matter what you think might be in this PV, do not cleverly switch to character semantics when I’m not looking.” • pack/unpack themselves don’t care, it’s things like length and substr 25 Tuesday, February 28, 2012
  14. horrific abuses think like a C programmer serialization tricks lazy

    perlification 27 Tuesday, February 28, 2012
  15. • where is things.a? things. • where is things.b? *(&things

    + 1). • where is lots_of_things[2].b? lots_of_things + (2 * sizeof (two_things)) + 1. • where is the point? next slide. typedef struct TWO_THINGS { char a; char b; } two_things; two_things things; two_things lots_of_things[1000]; 29 Tuesday, February 28, 2012
  16. • where is $things.a? unpack ‘cx’, $things; • where is

    $things.b? unpack ‘xc’, $things; • where is $lots_of_things[2].b? unpack ‘(xx)2xc’, $lots_of_things 30 Readonly my $FORMAT => ‘cc’; my $things = pack $FORMAT; my $lots_of_things = pack “($FORMAT)1000”; Tuesday, February 28, 2012
  17. • bytes, bytes, bytes on the brain • byte offsets

    a natural way of thinking about working with data • “language neutral” is just a cute way of saying “C” 31 Tuesday, February 28, 2012
  18. • “strong typing” the roundabout way • unpack() == C

    cast: “I, programmer, assure you, language, that these bytes contain precisely data of this type, and I will live with the consequences if I’m wrong.” 32 Tuesday, February 28, 2012
  19. example: SEGV! • god, I miss pointers sometimes • (but

    not right now) 33 my $bar = unpack 'P', ‘asdf’; Tuesday, February 28, 2012
  20. but... • we are not writing C • because down

    that road lies madness • still, its siren song is hard to resist... 35 Tuesday, February 28, 2012
  21. space efficiency • Storable: general-purpose • what does that mean?

    • if you’re thinking like a C programmer, maybe you can do better... 37 Tuesday, February 28, 2012
  22. example: array of shorts 38 @shorts = map {int((rand 256)-128)}

    (1..10000); ## 20,000 bytes: 2 bytes per element $packed = pack 's*', @shorts; ## 20,016 bytes: 2 bytes per element $stored = Storable::freeze(\@shorts); ## harmlessly examine contents of @shorts... print "$_\n" for @shorts; ## roughly 46,000 bytes: ??? $stored = Storable::freeze(\@shorts); • Extra credit: deserialize just $shorts [2113]... Tuesday, February 28, 2012
  23. fixed width • depending on what you’re serializing • interesting

    properties • more in a bit 39 Tuesday, February 28, 2012
  24. keyless hashes • when a hash is really a struct/record

    • thinking like a C programmer again! • serialize bags of them without bags of redundant copies of their keys 40 Tuesday, February 28, 2012
  25. idiom 41 ## shape of the “structure” and format are

    ## passed or encoded separately Readonly my $TEMPLATE => ‘VVC'; Readonly my @FIELDS => qw(thing1 thing2 kite); ## get the bytes my $bytes = get_from_somewhere(); ## unpack via hash slice FTW! my %thing; @thing{@FIELDS} = unpack $TEMPLATE, $bytes; Tuesday, February 28, 2012
  26. example: keyless hash 42 my @records = map { {

    thing1 => int rand 4294967296, thing2 => int rand 4294967296, kite => int rand 255, } } (1 .. 10000); ## 90,000 bytes: 9 bytes per record my $packed = pack "($TEMPLATE)*", map { @{$_}{@FIELDS} } @records; ## roughly 544,000 bytes: 54 bytes per record my $stored = Storable::freeze(\@records); Tuesday, February 28, 2012
  27. • for transient bytes e.g. from key-value storage • for

    sparse algorithms e.g. binary search • otherwise, don’t do this! • or at least, don’t blame me 44 Tuesday, February 28, 2012
  28. example: filtering • problem scale: 100k x 20k x 100

    • idea 1: regular expressions! • idea 2: binary search, of course! • idea 3: binary search + lazy perlification 45 Tuesday, February 28, 2012
  29. lazy binary search 49 pack('Ca*', $size, pack(“(Z$size)*”, @sorted_haystack)); $size =

    unpack('C', ${$frozen_haystack_ref}); $format = ‘Z’ . $size; ... $element = unpack('x' . ($size * $mid + 1) . $format, ${$frozen_haystack_ref}); $cmp = $element cmp $needle; ... Tuesday, February 28, 2012
  30. summary • bytes, bytes, bytes • “Premature optimization is the

    root of all evil.” -- Donald Knuth 50 Tuesday, February 28, 2012