Decoding the Secrets of Binary Data (Droidcon NYC 2016)

69252b3de5cb7f464c09301d9a6b0401?s=47 Jesse Wilson
September 30, 2016

Decoding the Secrets of Binary Data (Droidcon NYC 2016)

Video: https://www.youtube.com/watch?v=T_p22jMZSrk
Code: https://github.com/swankjesse/encoding

Opaque blobs of data have hexed Android programmers for too long. It’s time to byte the bullet and learn how data is transmitted and persisted.

In this talk we’ll:

💾 Learn a bit about base64, little-endian, and EOF.
💾 See how inefficient encodings nibble away resources.
💾 Hash out the differences between ASCII, UTF-8, and other charsets.
💾 Zip through examples of compression, crypto, and protocol buffers.
💾 Load up on APIs and discover what Square’s Okio has in store.

This talk offers a short introduction to an array of topics. You’ll learn enough to be encode & decode whatever data you select!

69252b3de5cb7f464c09301d9a6b0401?s=128

Jesse Wilson

September 30, 2016
Tweet

Transcript

  1. 3.

    0 1

  2. 4.
  3. 5.
  4. 8.
  5. 9.

    Date: Aprilis 11, 1095 From: Emperor Alexios Komnenos To: Robert

    II, Count of Flanders Bob, Looks like we’re gonna war with the Turks. Could you see if the Pope could help us crusade against ’em? Thanks! Alex Letters! • Human readable • Signed
  6. 10.

    Date: Aprilis 11, 1095 From: Emperor Alexios Komnenos To: Robert

    II, Count of Flanders Bob, Looks like we’re gonna war with the Turks. Could you see if the Pope could help us crusade against ’em? Thanks! Alex Letters! • Slow, dangerous to transmit • Awkward to store • Requires Literacy!
  7. 11.
  8. 14.

    Morse Code A • ▬ G ▬ ▬ • M

    ▬ ▬ S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ •
  9. 15.

    A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • •
  10. 16.

    A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬
  11. 17.

    A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ •
  12. 18.

    A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ • • • ▬
  13. 19.

    A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ • • • ▬ ▬
  14. 20.

    A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ • • • ▬ ▬
  15. 23.

    Morse Code • Very limited characters • No lowercase •

    Limited punctuation* • Average message: 12 words * For example, morse code doesn’t have an asterisk symbol.
  16. 25.

    205

  17. 30.

    205

  18. 31.

    205

  19. 38.

    Binary Refresher! • Decimal lets you represent any integer with

    a sequence of digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 • Binary lets you represent any integer with a sequence of bits: 0, 1
  20. 40.

    Binary Refresher! • N bits can represent 2N values •

    8 bits: 0..255 • 16 bits: 0..65,535 • 32 bits: 0..4,294,967,295 • 64 bits: 0..18,446,744,073,709,551,615
  21. 42.

    Layering • Given a wire that transmits a single bit,

    we can use binary to encode any integer! • This works because the sender and recipient agree on how to interpret the sequence • That interpretation is called an encoding
  22. 43.

    Bytes • Though binary supports any number of bits, we

    like 8-bit integers • An 8-bit integer is called a byte • 256 values from 0 to 255
  23. 44.
  24. 45.

    ASCII • American Standard Code for Information Interchange • A

    table of characters • Interpret a sequence of bytes as a string of characters!
  25. 46.

    ASCII • Work started in 1960 • Only uses 7

    bits: in 1967 bits were very expensive! • That means there’s 27 = 128 characters
  26. 47.

    ASCII 0 NULL 16 DLE 32 SP 48 0 64

    @ 80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL
  27. 48.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68
  28. 49.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111
  29. 50.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110
  30. 51.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117
  31. 52.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117 116
  32. 53.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117 116
  33. 54.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117 116
  34. 55.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 111 110 117 116 0 1 0 0 0 1 0 0
  35. 56.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 110 117 116 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1
  36. 57.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 117 116 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0
  37. 58.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 116 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1
  38. 59.

    0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0
  39. 60.
  40. 66.

    Charset Hell • Operating systems were installed for a specific

    character set and wouldn’t work with any others • Documents couldn’t mix Greek, French, and Russian characters • If you see ISO-8859-1, run away!
  41. 67.
  42. 68.

    Unicode • Support all languages in a single system •

    A code point is a universal ID for a character
  43. 69.

    UTF-16 • 16-bit Unicode Transformation Format • 2 bytes per

    code point • This is Java’s char type
  44. 71.

    ASCII 0 1 0 0 0 1 0 0 0

    1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 D UTF-16
  45. 72.

    ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 o
  46. 73.

    ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 on
  47. 74.

    ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 onu
  48. 75.

    ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 onut
  49. 76.

    ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 onut
  50. 77.

    UTF-16 characters • Max code point is 65,535 • Code

    point for is 127,849 • 127,849 > 65,535
  51. 78.

    Java’s char is broken! • There’s a system called “surrogate

    pairs” which is like multidex for code points • It splits a single code point across 2 chars • It’s an incredible pain
  52. 80.

    String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; i++) {
 char c = s.charAt(i);
 System.out.printf("The character at %d is '%c'%n", i, c);
 }
  53. 81.

    String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; i++) {
 char c = s.charAt(i);
 System.out.printf("The character at %d is '%c'%n", i, c);
 } The character at 0 is 'C' The character at 1 is 'a' The character at 2 is 'f' The character at 3 is 'é' The character at 4 is ' ' The character at 5 is ' ' The character at 6 is ' '
  54. 82.

    String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; ) {
 int c = s.codePointAt(i);
 System.out.printf("The code point at %d is '%c'%n", i, c); i += Character.charCount(c);
 }
  55. 83.

    String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; ) {
 int c = s.codePointAt(i);
 System.out.printf("The code point at %d is '%c'%n", i, c); i += Character.charCount(c);
 } The code point at 0 is 'C' The code point at 1 is 'a' The code point at 2 is 'f' The code point at 3 is 'é' The code point at 4 is ' ' The code point at 5 is ' '
  56. 84.
  57. 85.

    UTF-8 • 8-bit Unicode Transformation Format • Variable number of

    bytes per code point • This is how modern apps transmit & store text
  58. 87.

    UTF-8 • Many common characters are 1-byte • Some are

    2 and 3 bytes. ‘ ’ is 4 bytes. • Self-delimiting • Self-aligning
  59. 88.

    0 1 1 0 1 0 1 1 1 0

    1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 ≤7 ≤11 ≤16 ≤21 “How many bits do you need?”
  60. 91.

    1 0 0 0 0 1 1 C “How many

    bits do you need?” a f é sp
  61. 92.

    1 0 0 0 0 1 1 C “How many

    bits do you need?” a f é sp 97
  62. 93.

    1 0 0 0 0 1 1 C “How many

    bits do you need?” 1 1 0 0 0 0 1 a f é sp
  63. 94.

    1 0 0 0 0 1 1 C “How many

    bits do you need?” 1 1 0 0 0 0 1 a f é sp 102
  64. 95.

    1 0 0 0 0 1 1 C “How many

    bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  65. 96.

    233 1 0 0 0 0 1 1 C “How

    many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  66. 97.

    1 1 1 0 1 0 0 1 1 0

    0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  67. 98.

    1 1 1 0 1 0 0 1 1 0

    0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 32
  68. 99.

    1 0 0 0 0 0 1 1 1 0

    1 0 0 1 1 0 0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  69. 100.

    1 0 0 0 0 0 1 1 1 0

    1 0 0 1 1 0 0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 127,849
  70. 101.

    1 0 1 0 0 1 0 0 1 1

    0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  71. 102.

    0 1 1 0 1 1 1 1 0 1

    0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 0 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 1 1 1 0 0 0 0 1 1
  72. 103.

    0 1 1 0 0 0 0 1 1 1

    1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 0 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 1 1 1 0 0 0 0 1 1
  73. 104.

    0 1 0 0 0 0 1 1 1 1

    0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0
  74. 105.

    UTF-8 • Basically the best thing ever • Superset of

    ASCII • Great for JSON and HTML because delimiter characters <, > and " are 1-byte
  75. 106.
  76. 107.
  77. 108.
  78. 109.
  79. 110.
  80. 115.

    Hexadecimal Refresher! • Decimal digits: 0, 1, 2, 3, 4,

    5, 6, 7, 8, 9 • Binary bits: 0, 1 • Hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
  81. 116.

    205

  82. 122.

    Hexadecimal Refresher! • Prefixed with “0x”, like 0xcd • Hex

    bytes are always two digits: 00, 01, 02 … ff • Sequences of bytes are okay: 0bb634
  83. 123.

    Hexadecimal Refresher! • Colors: #ffffff • URL escaping: http://example.com/?q=hello%20world •

    Unicode code points U+2020 • IPv6 addresses: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
  84. 125.
  85. 126.

    Pictures • Just a 2D array of colors • Given

    3 bytes per pixel: • 64 × 64 icon is 12,288 bytes • 1080 × 1920 picture is 5.9 MiB • Compression is important!
  86. 127.

    Pictures • Android can use fewer bits per pixel •

    ARGB_8888: alpha, red, green, and blue get 8 bits each • RGB_565: red gets 5 bits, geen gets 6, blue gets 5 bits
  87. 131.

    /** https://en.wikipedia.org/wiki/BMP_file_format */
 public void encode(BufferedSink sink) throws IOException {


    int height = pixels.length;
 int width = pixels[0].length;
 
 int bytesPerPixel = 3;
 int rowByteCountWithoutPadding = (bytesPerPixel * width);
 int rowByteCount = ((rowByteCountWithoutPadding + 3) / 4) * 4;
 int pixelDataSize = rowByteCount * height;
 int bmpHeaderSize = 14;
 int dibHeaderSize = 40;
 
 // BMP Header
 sink.writeUtf8("BM"); // ID.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize + pixelDataSize); // File size.
 sink.writeShortLe(0); // Unused.
 sink.writeShortLe(0); // Unused.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize); // Offset of pixel data.
 
 // DIB Header
 sink.writeIntLe(dibHeaderSize);

  88. 132.

    int dibHeaderSize = 40;
 
 // BMP Header
 sink.writeUtf8("BM"); //

    ID.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize + pixelDataSize); // File size.
 sink.writeShortLe(0); // Unused.
 sink.writeShortLe(0); // Unused.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize); // Offset of pixel data.
 
 // DIB Header
 sink.writeIntLe(dibHeaderSize);
 sink.writeIntLe(width);
 sink.writeIntLe(height);
 sink.writeShortLe(1); // Color plane count.
 sink.writeShortLe(bytesPerPixel * Byte.SIZE);
 sink.writeIntLe(0); // No compression.
 sink.writeIntLe(16); // Size of bitmap data including padding.
 sink.writeIntLe(2835); // Horizontal print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(2835); // Vertical print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.
 
 // Pixel data.

  89. 133.

    sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  90. 134.

    Pictures on Bytes • This bitmap writer is 50 lines

    of code • Decoders are more difficult! • Good specs make it easy
  91. 135.

    sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  92. 136.

    sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  93. 137.

    (pixel & 0xff0000) >>> 16 • Shifting and masking lets

    you access the bits within an integer • & and | operators treat each int like a 32-element boolean array! • <<, >> and >>> operators slide bits left and right
  94. 138.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0000ab00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 0000ff00 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x00ffab40 & 0x0000ff00
  95. 139.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0000ab00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 0000ff00 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x00ffab40 & 0x0000ff00
  96. 140.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0000ab00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 0000ff00 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x00ffab40 & 0x0000ff00 & =
  97. 141.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000ab00 0x0000ab00 >> 8
  98. 142.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000ab00 0x0000ab00 >> 8 { 8
  99. 143.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000ab 0x0000ab00 >> 8
  100. 144.

    1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000ab 0x0000ab00 >> 8 = 0x000000ab
  101. 145.

    sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  102. 147.

    int dibHeaderSize = 40;
 
 // BMP Header
 sink.writeUtf8("BM"); //

    ID.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize + pixelDataSize); // File size.
 sink.writeShortLe(0); // Unused.
 sink.writeShortLe(0); // Unused.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize); // Offset of pixel data.
 
 // DIB Header
 sink.writeIntLe(dibHeaderSize);
 sink.writeIntLe(width);
 sink.writeIntLe(height);
 sink.writeShortLe(1); // Color plane count.
 sink.writeShortLe(bytesPerPixel * Byte.SIZE);
 sink.writeIntLe(0); // No compression.
 sink.writeIntLe(16); // Size of bitmap data including padding.
 sink.writeIntLe(2835); // Horizontal print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(2835); // Vertical print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.
 
 // Pixel data.

  103. 148.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Big Endian
  104. 149.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Big Endian
  105. 150.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  106. 151.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  107. 152.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  108. 153.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  109. 154.

    0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  110. 155.

    0 1 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  111. 156.

    1 0 1 0 1 0 1 1 0 1

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  112. 157.

    1 0 1 0 1 0 1 1 1 1

    1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  113. 158.

    1 0 1 0 1 0 1 1 1 1

    1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  114. 159.

    1 0 1 0 1 0 1 1 1 1

    1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  115. 162.

    JSON is Good Stuff {
 "id": 72017,
 "date": "2016-09-30T18:30:00Z",
 "room":

    "RIGHT",
 "title": "Decoding the Secrets of Binary Data",
 "speaker": "Jesse Wilson"
 }
  116. 163.

    JSON is Good Stuff {
 "id": 72017,
 "date": "2016-09-30T18:30:00Z",
 "room":

    "RIGHT",
 "title": "Decoding the Secrets of Binary Data",
 "speaker": "Jesse Wilson"
 } • A nice format that builds on UTF-8 • Easy to read & write
  117. 164.

    JSON is Self-Delimiting {
 "id": 72017,
 "date": "2016-09-30T18:30:00Z",
 "room": "RIGHT",


    "title": "Decoding the Secrets of Binary Data",
 "speaker": "Jesse Wilson"
 } • A JSON document has both structure and data • Uses escape sequences like \" to be completely unambiguous
  118. 165.

    "2016-09-30T18:30:00Z" • System.currentTimeMillis() returns milliseconds since January 1, 1970 at

    00:00:00 UTC • A date that’s 8 bytes in memory is 22 bytes in JSON!
  119. 166.

    • Space: a simple message like this one is ~128

    bytes • Time: bigger sequences take longer to decode JSON Space & Time
  120. 167.

    • Google’s “small, fast, simple” structured data format • Upon

    closer inspection, it’s not that different from JSON! • But it has a schema Protocol Buffers
  121. 168.

    { message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } "id": 72017, "date": "2016-09-30T18:30:00Z", "room": "RIGHT", "title": "Decoding the Secrets of Binary Data", "speaker": "Jesse Wilson" } "id" "date" "room" "title" "speaker" 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  122. 169.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 }
  123. 170.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode
  124. 171.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 1 0 1
  125. 172.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 1 0 1 0 0 1
  126. 173.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 0 0 0 1 0 1 0 0 1
  127. 174.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 0 0 0 1 0 1 0 1 0 0 0 1
  128. 175.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1
  129. 176.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1
  130. 177.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1
  131. 178.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1
  132. 179.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1
  133. 180.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
  134. 181.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
  135. 182.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
  136. 183.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0d 11 18 22 2a
  137. 184.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a { "id": 72017, "date": "2016-09-30T18:30:00Z", "room": "RIGHT", "title": "Decoding the Secrets of Binary Data", "speaker": "Jesse Wilson" } "id" "date" "room" "title" "speaker" 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  138. 185.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  139. 186.

    message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  140. 187.

    51 19 01 message Talk {
 optional fixed32 id =

    1;
 optional fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00
  141. 188.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00
  142. 189.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00
  143. 190.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 000001577D37EE40
  144. 191.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40
  145. 192.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40
  146. 193.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40 02
  147. 194.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "Decoding the Secrets of Binary Data" "Jesse Wilson", 00 00 01 57 7D 37 EE 40 02 00
  148. 195.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40 02
  149. 196.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40 02 23
  150. 197.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 "Jesse Wilson" 00 00 00 01 57 7D 37 EE 40 02 23
  151. 198.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 4a657373652057696c736f6e 00 00 00 01 57 7D 37 EE 40 02 23
  152. 199.

    511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 4a657373652057696c736f6e 00 00 00 01 57 7D 37 EE 40 02 23 0c
  153. 200.

    511901 0d 11 18 22 2a 4a657373652057696c736f6e 00 00 00

    01 57 7D 37 EE 40 02 23 0c 4465636f64696e67207468652053656372657473206f662042696e6172792044617461
  154. 201.

    511901 0d 11 18 22 2a 4465636f64696e6720746865205365637265747 4a657373652057696c736f6e 00 00

    00 01 57 7D 37 EE 40 02 23 0c 3206f662042696e6172792044617461
  155. 202.

    511901 0d 11 18 22 2a 4465636f64696e6720746865205365637265747 4a657373652057696c736f6e 00 00

    00 01 57 7D 37 EE 40 02 23 0c 3206f662042696e6172792044617461
  156. 203.

    511901 0d 11 18 22 2a 4465636f64696e6720746865205365637265747 4a657373652057696c736f6e 00 00

    00 01 57 7D 37 EE 40 02 23 0c 3206f662042696e6172792044617461 • Protocol Buffers are small and fast • ~1 byte for each field name • Compact encoding for numbers and enums • Strings stay the same length! • This message is 67 bytes in protocol buffers, vs. 128 for JSON
  157. 204.
  158. 205.

    • Mutable! • equals() doesn’t work like it should: can’t

    be a map key! • Doesn’t implement Comparable byte[] is bad
  159. 206.

    • Range is -128..127 but you usually want 0..255 •

    But unsigned when you call InputStream.read() byte is signed
  160. 207.

    • Range is -128..127 but you usually want 0..255 •

    But unsigned when you call InputStream.read() byte is signed
  161. 208.
  162. 211.

    ByteString makes data easy ByteString byteString = ByteString.decodeHex("436166c3a920f09f8da9");
 assertThat(byteString.size()).isEqualTo(10);
 assertThat(byteString.getByte(0)).isEqualTo((byte)

    0x43);
 assertThat(byteString.getByte(0)).isEqualTo((byte) 67);
 
 String cafeDonuts = byteString.utf8();
 assertThat(cafeDonuts).isEqualTo("Café ");
 
 ByteString cafe = ByteString.encodeUtf8("Café");
 assertThat(byteString.startsWith(cafe)).isTrue();
  163. 213.

    class Talk implements Parcelable {
 public static final Parcelable.Creator<Talk> CREATOR

    = new Parcelable.Creator<Talk>() {
 @Override public Talk createFromParcel(Parcel in) {
 int id = in.readInt();
 long date = in.readLong();
 Room room = Room.values()[in.readInt()];
 String title = in.readString();
 String speaker = in.readString();
 return new Talk(id, date, room, title, speaker);
 }
 };
 …
 
 @Override public void writeToParcel(Parcel out, int flags) {
 out.writeInt(id);
 out.writeLong(date);
 out.writeInt(room.ordinal());
 out.writeString(title);
 out.writeString(speaker);
 }
 }
  164. 214.

    @Test public void decodeGolden() {
 Talk talk = new Talk(72017,

    1475260200000L, Room.RIGHT,
 "Decoding the Secrets of Binary Data", "Jesse Wilson");
 ByteString goldenData = ByteString.decodeHex("01000000511901004034…");
 assertThat(parcelDecode(goldenData, Talk.CREATOR)).isEqualTo(talk);
 }a
 
 private <T extends Parcelable> T parcelDecode(
 ByteString byteString, Parcelable.Creator<T> creator) {
 Parcel parcel = Parcel.obtain();
 try {
 parcel.unmarshall(byteString.toByteArray(), 0, byteString.size());
 parcel.setDataPosition(0);
 return parcel.readTypedObject(creator);
 } finally {
 parcel.recycle();
 }a
 }a
  165. 215.

    @Test public void decodeGolden() {
 Talk talk = new Talk(72017,

    1475260200000L, Room.RIGHT,
 "Decoding the Secrets of Binary Data", "Jesse Wilson");
 ByteString goldenData = ByteString.decodeHex("01000000511901004034…");
 assertThat(parcelDecode(goldenData, Talk.CREATOR)).isEqualTo(talk);
 }a
 
 private <T extends Parcelable> T parcelDecode(
 ByteString byteString, Parcelable.Creator<T> creator) {
 Parcel parcel = Parcel.obtain();
 try {
 parcel.unmarshall(byteString.toByteArray(), 0, byteString.size());
 parcel.setDataPosition(0);
 return parcel.readTypedObject(creator);
 } finally {
 parcel.recycle();
 }a
 }a
  166. 216.
  167. 217.

    • Everything is bytes • Java Strings are UTF-16. Encoded

    text is usually UTF-8 • Hex is handy • Don’t be afraid of shifting and masking • Integers are big or little-endian • Java’s I/O APIs are trouble 6 tips