Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Decoding the Secrets of Binary Data (Droidcon NYC 2016)

Jesse Wilson
September 30, 2016

Decoding the Secrets of Binary Data (Droidcon NYC 2016)

Video: https://www.youtube.com/watch?v=T_p22jMZSrk
Code: https://github.com/swankjesse/encoding

Opaque blobs of data have hexed Android programmers for too long. It’s time to byte the bullet and learn how data is transmitted and persisted.

In this talk we’ll:

💾 Learn a bit about base64, little-endian, and EOF.
💾 See how inefficient encodings nibble away resources.
💾 Hash out the differences between ASCII, UTF-8, and other charsets.
💾 Zip through examples of compression, crypto, and protocol buffers.
💾 Load up on APIs and discover what Square’s Okio has in store.

This talk offers a short introduction to an array of topics. You’ll learn enough to be encode & decode whatever data you select!

Jesse Wilson

September 30, 2016
Tweet

More Decks by Jesse Wilson

Other Decks in Programming

Transcript

  1. 0 1

  2. Date: Aprilis 11, 1095 From: Emperor Alexios Komnenos To: Robert

    II, Count of Flanders Bob, Looks like we’re gonna war with the Turks. Could you see if the Pope could help us crusade against ’em? Thanks! Alex Letters! • Human readable • Signed
  3. Date: Aprilis 11, 1095 From: Emperor Alexios Komnenos To: Robert

    II, Count of Flanders Bob, Looks like we’re gonna war with the Turks. Could you see if the Pope could help us crusade against ’em? Thanks! Alex Letters! • Slow, dangerous to transmit • Awkward to store • Requires Literacy!
  4. Morse Code A • ▬ G ▬ ▬ • M

    ▬ ▬ S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ •
  5. A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • •
  6. A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬
  7. A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ •
  8. A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ • • • ▬
  9. A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ • • • ▬ ▬
  10. A • ▬ G ▬ ▬ • M ▬ ▬

    S • • • Y ▬ • ▬ ▬ 4 • • • • ▬ B ▬ • • • H • • • • N ▬ • T ▬ Z ▬ ▬ • • 5 • • • • • C ▬ • ▬ • I • • O ▬ ▬ ▬ U • • ▬ 0 ▬ ▬ ▬ ▬ ▬ 6 ▬ • • • • D ▬ • • J • ▬ ▬ ▬ P • ▬ ▬ • V • • • ▬ 1 • ▬ ▬ ▬ ▬ 7 ▬ ▬ • • • E • K ▬ • ▬ Q ▬ ▬ • ▬ W • ▬ ▬ 2 • • ▬ ▬ ▬ 8 ▬ ▬ ▬ • • F • • ▬ • L • ▬ • • R • ▬ • X ▬ • • ▬ 3 • • • ▬ ▬ 9 ▬ ▬ ▬ ▬ • ▬ • • ▬ ▬ ▬ ▬ • • • ▬ ▬
  11. Morse Code • Very limited characters • No lowercase •

    Limited punctuation* • Average message: 12 words * For example, morse code doesn’t have an asterisk symbol.
  12. 205

  13. 205

  14. 205

  15. Binary Refresher! • Decimal lets you represent any integer with

    a sequence of digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 • Binary lets you represent any integer with a sequence of bits: 0, 1
  16. Binary Refresher! • N bits can represent 2N values •

    8 bits: 0..255 • 16 bits: 0..65,535 • 32 bits: 0..4,294,967,295 • 64 bits: 0..18,446,744,073,709,551,615
  17. Layering • Given a wire that transmits a single bit,

    we can use binary to encode any integer! • This works because the sender and recipient agree on how to interpret the sequence • That interpretation is called an encoding
  18. Bytes • Though binary supports any number of bits, we

    like 8-bit integers • An 8-bit integer is called a byte • 256 values from 0 to 255
  19. ASCII • American Standard Code for Information Interchange • A

    table of characters • Interpret a sequence of bytes as a string of characters!
  20. ASCII • Work started in 1960 • Only uses 7

    bits: in 1967 bits were very expensive! • That means there’s 27 = 128 characters
  21. ASCII 0 NULL 16 DLE 32 SP 48 0 64

    @ 80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL
  22. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68
  23. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111
  24. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110
  25. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117
  26. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117 116
  27. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117 116
  28. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 68 111 110 117 116
  29. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 111 110 117 116 0 1 0 0 0 1 0 0
  30. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 110 117 116 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1
  31. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 117 116 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0
  32. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 116 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1
  33. 0 NULL 16 DLE 32 SP 48 0 64 @

    80 P 96 ` 112 p 1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r 3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s 4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t 5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u 6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v 7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w 8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x 9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y 10 LF 26 SUB 42 * 58 : 74 J 90 Z 106 j 122 z 11 VT 27 ESC 43 + 59 ; 75 K 91 [ 107 k 123 { 12 FF 28 FS 44 , 60 < 76 L 92 \ 108 l 124 | 13 CR 29 GS 45 - 61 = 77 M 93 ] 109 m 125 } 14 SO 30 RS 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 SI 31 US 47 / 63 ? 79 O 95 _ 111 o 127 DEL 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0
  34. Charset Hell • Operating systems were installed for a specific

    character set and wouldn’t work with any others • Documents couldn’t mix Greek, French, and Russian characters • If you see ISO-8859-1, run away!
  35. Unicode • Support all languages in a single system •

    A code point is a universal ID for a character
  36. UTF-16 • 16-bit Unicode Transformation Format • 2 bytes per

    code point • This is Java’s char type
  37. ASCII 0 1 0 0 0 1 0 0 0

    1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 D UTF-16
  38. ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 o
  39. ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 on
  40. ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 onu
  41. ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 onut
  42. ASCII 0 1 0 0 0 1 0 0 0

    1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D UTF-16 onut
  43. UTF-16 characters • Max code point is 65,535 • Code

    point for is 127,849 • 127,849 > 65,535
  44. Java’s char is broken! • There’s a system called “surrogate

    pairs” which is like multidex for code points • It splits a single code point across 2 chars • It’s an incredible pain
  45. String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; i++) {
 char c = s.charAt(i);
 System.out.printf("The character at %d is '%c'%n", i, c);
 }
  46. String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; i++) {
 char c = s.charAt(i);
 System.out.printf("The character at %d is '%c'%n", i, c);
 } The character at 0 is 'C' The character at 1 is 'a' The character at 2 is 'f' The character at 3 is 'é' The character at 4 is ' ' The character at 5 is ' ' The character at 6 is ' '
  47. String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; ) {
 int c = s.codePointAt(i);
 System.out.printf("The code point at %d is '%c'%n", i, c); i += Character.charCount(c);
 }
  48. String s = “Café ";
 
 for (int i =

    0, size = s.length(); i < size; ) {
 int c = s.codePointAt(i);
 System.out.printf("The code point at %d is '%c'%n", i, c); i += Character.charCount(c);
 } The code point at 0 is 'C' The code point at 1 is 'a' The code point at 2 is 'f' The code point at 3 is 'é' The code point at 4 is ' ' The code point at 5 is ' '
  49. UTF-8 • 8-bit Unicode Transformation Format • Variable number of

    bytes per code point • This is how modern apps transmit & store text
  50. UTF-8 • Many common characters are 1-byte • Some are

    2 and 3 bytes. ‘ ’ is 4 bytes. • Self-delimiting • Self-aligning
  51. 0 1 1 0 1 0 1 1 1 0

    1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 ≤7 ≤11 ≤16 ≤21 “How many bits do you need?”
  52. 1 0 0 0 0 1 1 C “How many

    bits do you need?” a f é sp
  53. 1 0 0 0 0 1 1 C “How many

    bits do you need?” a f é sp 97
  54. 1 0 0 0 0 1 1 C “How many

    bits do you need?” 1 1 0 0 0 0 1 a f é sp
  55. 1 0 0 0 0 1 1 C “How many

    bits do you need?” 1 1 0 0 0 0 1 a f é sp 102
  56. 1 0 0 0 0 1 1 C “How many

    bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  57. 233 1 0 0 0 0 1 1 C “How

    many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  58. 1 1 1 0 1 0 0 1 1 0

    0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  59. 1 1 1 0 1 0 0 1 1 0

    0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 32
  60. 1 0 0 0 0 0 1 1 1 0

    1 0 0 1 1 0 0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  61. 1 0 0 0 0 0 1 1 1 0

    1 0 0 1 1 0 0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 127,849
  62. 1 0 1 0 0 1 0 0 1 1

    0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp
  63. 0 1 1 0 1 1 1 1 0 1

    0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 0 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 1 1 1 0 0 0 0 1 1
  64. 0 1 1 0 0 0 0 1 1 1

    1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 0 1 C “How many bits do you need?” 1 1 0 0 0 0 1 a 1 1 0 0 1 1 0 f é sp 1 1 1 0 0 0 0 1 1
  65. 0 1 0 0 0 0 1 1 1 1

    0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0
  66. UTF-8 • Basically the best thing ever • Superset of

    ASCII • Great for JSON and HTML because delimiter characters <, > and " are 1-byte
  67. Hexadecimal Refresher! • Decimal digits: 0, 1, 2, 3, 4,

    5, 6, 7, 8, 9 • Binary bits: 0, 1 • Hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
  68. 205

  69. Hexadecimal Refresher! • Prefixed with “0x”, like 0xcd • Hex

    bytes are always two digits: 00, 01, 02 … ff • Sequences of bytes are okay: 0bb634
  70. Hexadecimal Refresher! • Colors: #ffffff • URL escaping: http://example.com/?q=hello%20world •

    Unicode code points U+2020 • IPv6 addresses: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
  71. Pictures • Just a 2D array of colors • Given

    3 bytes per pixel: • 64 × 64 icon is 12,288 bytes • 1080 × 1920 picture is 5.9 MiB • Compression is important!
  72. Pictures • Android can use fewer bits per pixel •

    ARGB_8888: alpha, red, green, and blue get 8 bits each • RGB_565: red gets 5 bits, geen gets 6, blue gets 5 bits
  73. /** https://en.wikipedia.org/wiki/BMP_file_format */
 public void encode(BufferedSink sink) throws IOException {


    int height = pixels.length;
 int width = pixels[0].length;
 
 int bytesPerPixel = 3;
 int rowByteCountWithoutPadding = (bytesPerPixel * width);
 int rowByteCount = ((rowByteCountWithoutPadding + 3) / 4) * 4;
 int pixelDataSize = rowByteCount * height;
 int bmpHeaderSize = 14;
 int dibHeaderSize = 40;
 
 // BMP Header
 sink.writeUtf8("BM"); // ID.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize + pixelDataSize); // File size.
 sink.writeShortLe(0); // Unused.
 sink.writeShortLe(0); // Unused.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize); // Offset of pixel data.
 
 // DIB Header
 sink.writeIntLe(dibHeaderSize);

  74. int dibHeaderSize = 40;
 
 // BMP Header
 sink.writeUtf8("BM"); //

    ID.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize + pixelDataSize); // File size.
 sink.writeShortLe(0); // Unused.
 sink.writeShortLe(0); // Unused.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize); // Offset of pixel data.
 
 // DIB Header
 sink.writeIntLe(dibHeaderSize);
 sink.writeIntLe(width);
 sink.writeIntLe(height);
 sink.writeShortLe(1); // Color plane count.
 sink.writeShortLe(bytesPerPixel * Byte.SIZE);
 sink.writeIntLe(0); // No compression.
 sink.writeIntLe(16); // Size of bitmap data including padding.
 sink.writeIntLe(2835); // Horizontal print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(2835); // Vertical print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.
 
 // Pixel data.

  75. sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  76. Pictures on Bytes • This bitmap writer is 50 lines

    of code • Decoders are more difficult! • Good specs make it easy
  77. sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  78. sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  79. (pixel & 0xff0000) >>> 16 • Shifting and masking lets

    you access the bits within an integer • & and | operators treat each int like a 32-element boolean array! • <<, >> and >>> operators slide bits left and right
  80. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0000ab00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 0000ff00 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x00ffab40 & 0x0000ff00
  81. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0000ab00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 0000ff00 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x00ffab40 & 0x0000ff00
  82. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0000ab00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00ffab40 0000ff00 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x00ffab40 & 0x0000ff00 & =
  83. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000ab00 0x0000ab00 >> 8
  84. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000ab00 0x0000ab00 >> 8 { 8
  85. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000ab 0x0000ab00 >> 8
  86. 1 0 1 0 1 0 1 1 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000ab 0x0000ab00 >> 8 = 0x000000ab
  87. sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.


    
 // Pixel data.
 for (int y = height - 1; y >= 0; y--) {
 int[] row = pixels[y];
 for (int x = 0; x < width; x++) {
 int pixel = row[x];
 sink.writeByte((pixel & 0x0000ff)); // Blue.
 sink.writeByte((pixel & 0x00ff00) >>> 8); // Green.
 sink.writeByte((pixel & 0xff0000) >>> 16); // Red.
 }
 
 // Padding for 4-byte alignment.
 for (int p = rowByteCountWithoutPadding; p < rowByteCount; p++) {
 sink.writeByte(0);
 }
 }
 }

  88. int dibHeaderSize = 40;
 
 // BMP Header
 sink.writeUtf8("BM"); //

    ID.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize + pixelDataSize); // File size.
 sink.writeShortLe(0); // Unused.
 sink.writeShortLe(0); // Unused.
 sink.writeIntLe(bmpHeaderSize + dibHeaderSize); // Offset of pixel data.
 
 // DIB Header
 sink.writeIntLe(dibHeaderSize);
 sink.writeIntLe(width);
 sink.writeIntLe(height);
 sink.writeShortLe(1); // Color plane count.
 sink.writeShortLe(bytesPerPixel * Byte.SIZE);
 sink.writeIntLe(0); // No compression.
 sink.writeIntLe(16); // Size of bitmap data including padding.
 sink.writeIntLe(2835); // Horizontal print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(2835); // Vertical print resolution in pixels/meter. (72 dpi).
 sink.writeIntLe(0); // Palette color count.
 sink.writeIntLe(0); // 0 important colors.
 
 // Pixel data.

  89. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Big Endian
  90. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Big Endian
  91. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  92. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  93. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  94. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Big Endian
  95. 0 0 0 0 0 0 0 0 1 1

    1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  96. 0 1 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  97. 1 0 1 0 1 0 1 1 0 1

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  98. 1 0 1 0 1 0 1 1 1 1

    1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  99. 1 0 1 0 1 0 1 1 1 1

    1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  100. 1 0 1 0 1 0 1 1 1 1

    1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 Little Endian
  101. JSON is Good Stuff {
 "id": 72017,
 "date": "2016-09-30T18:30:00Z",
 "room":

    "RIGHT",
 "title": "Decoding the Secrets of Binary Data",
 "speaker": "Jesse Wilson"
 }
  102. JSON is Good Stuff {
 "id": 72017,
 "date": "2016-09-30T18:30:00Z",
 "room":

    "RIGHT",
 "title": "Decoding the Secrets of Binary Data",
 "speaker": "Jesse Wilson"
 } • A nice format that builds on UTF-8 • Easy to read & write
  103. JSON is Self-Delimiting {
 "id": 72017,
 "date": "2016-09-30T18:30:00Z",
 "room": "RIGHT",


    "title": "Decoding the Secrets of Binary Data",
 "speaker": "Jesse Wilson"
 } • A JSON document has both structure and data • Uses escape sequences like \" to be completely unambiguous
  104. "2016-09-30T18:30:00Z" • System.currentTimeMillis() returns milliseconds since January 1, 1970 at

    00:00:00 UTC • A date that’s 8 bytes in memory is 22 bytes in JSON!
  105. • Space: a simple message like this one is ~128

    bytes • Time: bigger sequences take longer to decode JSON Space & Time
  106. • Google’s “small, fast, simple” structured data format • Upon

    closer inspection, it’s not that different from JSON! • But it has a schema Protocol Buffers
  107. { message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } "id": 72017, "date": "2016-09-30T18:30:00Z", "room": "RIGHT", "title": "Decoding the Secrets of Binary Data", "speaker": "Jesse Wilson" } "id" "date" "room" "title" "speaker" 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  108. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 }
  109. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode
  110. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 1 0 1
  111. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 1 0 1 0 0 1
  112. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 0 0 0 1 0 1 0 0 1
  113. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 0 0 0 1 0 1 0 1 0 0 0 1
  114. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 000 enum, int32, int64... 001 fixed64 010 string, message 101 fixed32 Length Mode 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1
  115. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1
  116. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1
  117. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1
  118. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 1
  119. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
  120. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
  121. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1
  122. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }
 } 0d 11 18 22 2a
  123. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a { "id": 72017, "date": "2016-09-30T18:30:00Z", "room": "RIGHT", "title": "Decoding the Secrets of Binary Data", "speaker": "Jesse Wilson" } "id" "date" "room" "title" "speaker" 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  124. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  125. message Talk {
 optional fixed32 id = 1;
 optional fixed64

    date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 72017 "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson",
  126. 51 19 01 message Talk {
 optional fixed32 id =

    1;
 optional fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00
  127. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00
  128. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "2016-09-30T18:30:00Z", "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00
  129. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 000001577D37EE40
  130. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40
  131. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "RIGHT", "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40
  132. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "Decoding the Secrets of Binary Data", "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40 02
  133. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a "Decoding the Secrets of Binary Data" "Jesse Wilson", 00 00 01 57 7D 37 EE 40 02 00
  134. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40 02
  135. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 "Jesse Wilson", 00 00 00 01 57 7D 37 EE 40 02 23
  136. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 "Jesse Wilson" 00 00 00 01 57 7D 37 EE 40 02 23
  137. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 4a657373652057696c736f6e 00 00 00 01 57 7D 37 EE 40 02 23
  138. 511901 message Talk {
 optional fixed32 id = 1;
 optional

    fixed64 date = 2;
 optional Room room = 3;
 optional string title = 4;
 optional string speaker = 5;
 
 enum Room {
 UP = 1; RIGHT = 2; DOWN = 3; LEFT = 4;
 }a
 }a 0d 11 18 22 2a 4465636f64696e67207468652053656372657473206f662042696e6172792044617461 4a657373652057696c736f6e 00 00 00 01 57 7D 37 EE 40 02 23 0c
  139. 511901 0d 11 18 22 2a 4a657373652057696c736f6e 00 00 00

    01 57 7D 37 EE 40 02 23 0c 4465636f64696e67207468652053656372657473206f662042696e6172792044617461
  140. 511901 0d 11 18 22 2a 4465636f64696e6720746865205365637265747 4a657373652057696c736f6e 00 00

    00 01 57 7D 37 EE 40 02 23 0c 3206f662042696e6172792044617461
  141. 511901 0d 11 18 22 2a 4465636f64696e6720746865205365637265747 4a657373652057696c736f6e 00 00

    00 01 57 7D 37 EE 40 02 23 0c 3206f662042696e6172792044617461
  142. 511901 0d 11 18 22 2a 4465636f64696e6720746865205365637265747 4a657373652057696c736f6e 00 00

    00 01 57 7D 37 EE 40 02 23 0c 3206f662042696e6172792044617461 • Protocol Buffers are small and fast • ~1 byte for each field name • Compact encoding for numbers and enums • Strings stay the same length! • This message is 67 bytes in protocol buffers, vs. 128 for JSON
  143. • Mutable! • equals() doesn’t work like it should: can’t

    be a map key! • Doesn’t implement Comparable byte[] is bad
  144. • Range is -128..127 but you usually want 0..255 •

    But unsigned when you call InputStream.read() byte is signed
  145. • Range is -128..127 but you usually want 0..255 •

    But unsigned when you call InputStream.read() byte is signed
  146. ByteString makes data easy ByteString byteString = ByteString.decodeHex("436166c3a920f09f8da9");
 assertThat(byteString.size()).isEqualTo(10);
 assertThat(byteString.getByte(0)).isEqualTo((byte)

    0x43);
 assertThat(byteString.getByte(0)).isEqualTo((byte) 67);
 
 String cafeDonuts = byteString.utf8();
 assertThat(cafeDonuts).isEqualTo("Café ");
 
 ByteString cafe = ByteString.encodeUtf8("Café");
 assertThat(byteString.startsWith(cafe)).isTrue();
  147. class Talk implements Parcelable {
 public static final Parcelable.Creator<Talk> CREATOR

    = new Parcelable.Creator<Talk>() {
 @Override public Talk createFromParcel(Parcel in) {
 int id = in.readInt();
 long date = in.readLong();
 Room room = Room.values()[in.readInt()];
 String title = in.readString();
 String speaker = in.readString();
 return new Talk(id, date, room, title, speaker);
 }
 };
 …
 
 @Override public void writeToParcel(Parcel out, int flags) {
 out.writeInt(id);
 out.writeLong(date);
 out.writeInt(room.ordinal());
 out.writeString(title);
 out.writeString(speaker);
 }
 }
  148. @Test public void decodeGolden() {
 Talk talk = new Talk(72017,

    1475260200000L, Room.RIGHT,
 "Decoding the Secrets of Binary Data", "Jesse Wilson");
 ByteString goldenData = ByteString.decodeHex("01000000511901004034…");
 assertThat(parcelDecode(goldenData, Talk.CREATOR)).isEqualTo(talk);
 }a
 
 private <T extends Parcelable> T parcelDecode(
 ByteString byteString, Parcelable.Creator<T> creator) {
 Parcel parcel = Parcel.obtain();
 try {
 parcel.unmarshall(byteString.toByteArray(), 0, byteString.size());
 parcel.setDataPosition(0);
 return parcel.readTypedObject(creator);
 } finally {
 parcel.recycle();
 }a
 }a
  149. @Test public void decodeGolden() {
 Talk talk = new Talk(72017,

    1475260200000L, Room.RIGHT,
 "Decoding the Secrets of Binary Data", "Jesse Wilson");
 ByteString goldenData = ByteString.decodeHex("01000000511901004034…");
 assertThat(parcelDecode(goldenData, Talk.CREATOR)).isEqualTo(talk);
 }a
 
 private <T extends Parcelable> T parcelDecode(
 ByteString byteString, Parcelable.Creator<T> creator) {
 Parcel parcel = Parcel.obtain();
 try {
 parcel.unmarshall(byteString.toByteArray(), 0, byteString.size());
 parcel.setDataPosition(0);
 return parcel.readTypedObject(creator);
 } finally {
 parcel.recycle();
 }a
 }a
  150. • Everything is bytes • Java Strings are UTF-16. Encoded

    text is usually UTF-8 • Hex is handy • Don’t be afraid of shifting and masking • Integers are big or little-endian • Java’s I/O APIs are trouble 6 tips