Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Working with binary data in Ruby

Working with binary data in Ruby

Grzesiek Kolodziejczyk

September 01, 2019
Tweet

More Decks by Grzesiek Kolodziejczyk

Other Decks in Programming

Transcript

  1. How we got into this? Binary data 101 What's available

    in the STDLIB A gem that makes it all easier
  2. How we got into this? Finally not a cookie cutter

    project Data from meters and sensors Collected by gateways, sent over MQTT MQTT -> Sidekiq OMS protocol, raw payloads
  3. How we got into this? In production for over 2

    years ~2000 "requests" per minute Industry leaders in amount of supported devices
  4. Binary data 101 A byte is a group of 8

    bits 27 26 25 24 23 22 21 20 128 64 32 16 8 4 2 1 0 1 0 1 1 0 0 1 0 64 0 16 8 0 0 1 89
  5. What's available in the STDLIB Integer | | Directive |

    Returns | Meaning ----------------------------------------------------------------- C | Integer | 8-bit unsigned (unsigned char) S | Integer | 16-bit unsigned, native endian (uint16_t) L | Integer | 32-bit unsigned, native endian (uint32_t) Q | Integer | 64-bit unsigned, native endian (uint64_t)
  6. What's available in the STDLIB String | | Directive |

    Returns | Meaning ----------------------------------------------------------------- A | String | arbitrary binary string (remove trailing nulls and ASCII spaces) a | String | arbitrary binary string Z | String | null-terminated string B | String | bit string (MSB first) b | String | bit string (LSB first) H | String | hex string (high nibble first) h | String | hex string (low nibble first)
  7. gzip - show original filename +---+---+---+---+---+---+---+---+---+---+ |ID1|ID2|CM |FLG| MTIME |XFL|OS

    | (more-->) +---+---+---+---+---+---+---+---+---+---+ +---+---+=================================+ | XLEN |...XLEN bytes of "extra field"...| (more-->) +---+---+=================================+ +=========================================+ |...original file name, zero-terminated...| (more-->) +=========================================+
  8. 1 def original_filename(file_path) 2 File.open(file_path, "rb") do |f| 3 header

    = f.read(10) 4 magic, cm, flg, mtime, xfl, os = header.unpack("H4H2B8LH2C") 5 6 (magic == "1f8b") || raise("Invalid gzip header") 7 (cm == "08") || raise("Unknown compression method") 8 9 _, _, _, fcomment, fname, fextra, fhcrc, ftext = flg.split("") 10 11 if fextra == "1" 12 xlen = f.read(2).unpack1("S") 13 f.seek(xlen, IO::SEEK_CUR) 14 end 15 16 f.gets("\x00").unpack1("Z*") if fname == "1" 17 end 18 end
  9. 1 def original_filename(file_path) 2 File.open(file_path, "rb") do |f| 3 header

    = f.read(10) 4 magic, cm, flg, mtime, xfl, os = header.unpack("H4H2B8LH2C") 5 6 (magic == "1f8b") || raise("Invalid gzip header") 7 (cm == "08") || raise("Unknown compression method") 8 9 _, _, _, fcomment, fname, fextra, fhcrc, ftext = flg.split("") 10 11 if fextra == "1" 12 xlen = f.read(2).unpack1("S") 13 f.seek(xlen, IO::SEEK_CUR) 14 end 15 16 f.gets("\x00").unpack1("Z*") if fname == "1" 17 end 18 end
  10. 1 ['audio/x-ms-asx', [[0, 'ASF '], [0..64, '<ASX'], [0..64, '<asx'], [0..64,

    '<Asx']]], 2 ['application/annodex', [[0, 'OggS', [[28, "fishead\000", [[56..512, "CMML\000\000\000\000"]]]]]]], 3 ['application/dicom', [[128, 'DICM']]], 4 ['application/gnunet-directory', [[0, "\211GND\r\n\032\n"]]], 5 ['application/gzip', [[0, "\037\213"]]], 6 ['application/javascript', [[0, '#!/bin/gjs'], [0, '#! /bin/gjs'], [0, "eval \"exec /bin/ gjs"], [0, '#!/usr/bin/gjs'], [0, '#! /usr/bin/gjs'], [0, "eval \"exec /usr/bin/gjs"], [0, '#!/usr/ local/bin/gjs'], [0, '#! /usr/local/bin/gjs'], [0, "eval \"exec /usr/local/bin/gjs"], [2..16, '/bin/ env gjs']]], 7 ['application/mac-binhex40', [[11, 'must be converted with BinHex']]], 8 ['application/mathematica', [[0, '(************** Content-type: application/mathematica'], [100..256, 'This notebook can be used on any computer system with Mathematica'], [10..256, 'This is a Mathematica Notebook file. It contains ASCII text']]], 9 ['application/metalink+xml', [[0..256, "<metalink version=\"3.0\""]]], "\037\213" => "\u001F\x8B"
  11. BinData - a better way https://github.com/dmendel/bindata "BinData provides a declarative

    way to read and write structured binary data." used by ~1300 repos
  12. 1 class IPAddr < BinData::Primitive 2 array :octets, type: :uint8,

    initial_length: 4 3 4 def set(val) 5 self.octets = val.split(/\./).map(&:to_i) 6 end 7 8 def get 9 self.octets.map(&:to_s).join(".") 10 end 11 end IPAddr.new("192.168.1.1").to_binary_s => "\xC0\xA8\x01\x01"
  13. 1 class IPAddr < BinData::Primitive 2 array :octets, type: :uint8,

    initial_length: 4 3 4 def set(val) 5 self.octets = val.split(/\./).map(&:to_i) 6 end 7 8 def get 9 self.octets.map(&:to_s).join(".") 10 end 11 end IPAddr.read("\xC0\xA8\x01\x01") => "192.168.1.1"
  14. 1 class MacAddr < BinData::Primitive 2 array :octets, type: :uint8,

    initial_length: 6 3 4 def set(val) 5 self.octets = val.split(/:/).collect(&:to_i) 6 end 7 8 def get 9 octets.collect { |octet| "%02x" % octet }.join(":") 10 end 11 end MacAddr.new("00:01:02:03:04:05"). to_binary_s => "\x00\x01\x02\x03\x04\x05"
  15. 1 class Gzip < BinData::Record 2 # Known compression methods

    3 DEFLATE = 8 4 5 endian :little 6 7 uint16 :ident, asserted_value: 0x8b1f 8 uint8 :compression_method, asserted_value: DEFLATE 9 10 bit3 :freserved, asserted_value: 0 11 bit1 :fcomment 12 bit1 :ffile_name 13 bit1 :fextra 14 bit1 :fcrc16 15 bit1 :ftext 16 17 uint32 :mtime 18 uint8 :extra_flags 19 uint8 :os 20 21 struct :extra, onlyif: -> { fextra.nonzero? } do 22 uint16 :len 23 string :data, read_length: :len 24 end 25 stringz :file_name, onlyif: -> { ffile_name.nonzero? } 26 stringz :comment, onlyif: -> { fcomment.nonzero? } 27 uint16 :crc16, onlyif: -> { fcrc16.nonzero? } 28 29 # ignore rest so that we don't load everything into memory 30 end 1 def original_filename(file_path) 2 File.open(file_path, "rb") do |f| 3 header = f.read(10) 4 magic, cm, flg, mtime, xfl, os = header.unpack("H4H2B8LH2C") 5 6 (magic == "1f8b") || raise("Invalid gzip header") 7 (cm == "08") || raise("Unknown compression method") 8 9 _, _, _, fcomment, fname, fextra, fhcrc, ftext = flg.split("") 10 11 if fextra == "1" 12 xlen = f.read(2).unpack1("S") 13 f.seek(xlen, IO::SEEK_CUR) 14 end 15 16 f.gets("\x00").unpack1("Z*") if fname == "1" 17 end 18 end
  16. 1 class Mbus::BCD4 < BinData::Primitive 2 bit4 :d3 3 bit4

    :d4 4 bit4 :d1 5 bit4 :d2 6 7 def set(value) 8 self.d1 = (value / 10**3) % 10 9 self.d2 = (value / 10**2) % 10 10 self.d3 = (value / 10**1) % 10 11 self.d4 = (value / 10**0) % 10 12 end 13 14 def get 15 d1 * 10**3 + 16 d2 * 10**2 + 17 d3 * 10**1 + 18 d4 * 10**0 19 end 20 end
  17. 1 class Mbus::DataInformationField < BinData::Record 2 bit1 :extension_bit 3 bit1

    :storage_number_lsb 4 bit2 :function_field_code 5 bit4 :data_field_code 6 data_information_field_extension :data_information_field_extension, onlyif: -> { extension? } 7 8 # ... 9 end 1 class Mbus::DataInformationFieldExtension < BinData::Record 2 bit1 :extension_bit 3 bit1 :device_unit 4 bit2 :tariff 5 bit4 :storage_number 6 data_information_field_extension :data_information_field_extension, onlyif: -> { extension? } 7 8 # ... 9 end data_information_field.data_information_field_extension.data_information_field_extension....
  18. 1 class Mbus::DataTypeI < BinData::Primitive 2 bit1 :leap_year 3 bit1

    :dst 4 bit6 :second 5 6 bit1 :invalid 7 bit1 :dst_direction 8 bit6 :minute 9 10 bit3 :day_of_week 11 bit5 :hour 12 13 bit3 :year_lower 14 bit5 :day 15 16 bit4 :year_upper 17 bit4 :month 18 19 bit2 :dst_offset 20 bit6 :week_number 21 22 def get 23 Time.new(year, month, day, hour, minute, second) 24 end 25 26 def year 27 2000 + year_upper * 2**3 + year_lower 28 end 29 end
  19. 1 class Mbus::AuthenticationAndFragmentationLayer < BinData::Record 2 include Mbus::CI 3 4

    # AFL-Length [OMSvol2:4.1.2 6.2.1] 5 uint8 :afll 6 # Fragmentation Control Field [OMSvol2:4.1.2 6.2.1] 7 fragmentation_control_field :fcl 8 # Message Control Field [OMSvol2:4.1.2 6.2.1] 9 uint8 :mcl, onlyif: -> { fcl.mclp == 1 } 10 # Key Information Field [OMSvol2:4.1.2 6.2.1] 11 int16le :ki, onlyif: -> { fcl.kip == 1 } 12 # Message Counter Field [OMSvol2:4.1.2 6.2.1] 13 int32le :message_counter_c, onlyif: -> { fcl.mcrp == 1 } 14 # Message Authentication Code [OMSvol2:4.1.2 6.2.1] 15 string :cmac, length: 8, onlyif: -> { fcl.macp == 1 } 16 # Message Length Field [OMSvol2:4.1.2 6.2.1] 17 int16le :ml, onlyif: -> { fcl.mlp == 1 } 18 19 uint8 :next_control_info 20 count_bytes_remaining :bytes_remaining 21 transport_layer :transport_layer, read_length: :bytes_remaining, onlyif: :ci_transport_layer? 22 23 # ... 24 end