BINARY PROCESSING
NTUST INFORMATION SECURITY RESEARCH CLUB
Slide 2
Slide 2 text
BINARY PROCESSING
OUTLINE
▸ Data Type
▸ Text Encoding
▸ Binary Encoding
▸ Integer Representation and Endian
▸ Memory Model
▸ Data Structure
▸ Practice in Python
▸ xor-tool
Slide 3
Slide 3 text
TEXT
DATA
BINARY
STREAM
Series of Data
Decoded Binary
Slide 4
Slide 4 text
BINARY PROCESSING
SAMPLE OF DATA TYPES
▸ Text Data
▸ Markdown Document, Hypertext Markup Language (HTML)
▸ Text Stream
▸ Hypertext Transfer Protocol (HTTP)
▸ Binary Data
▸ Portable Network Graphics (PNG)
▸ Binary Stream
▸ Secure Shell (SSH), Secure Socket Layer (SSL)
Slide 5
Slide 5 text
BINARY PROCESSING
TEXT ENCODING
▸ Common Encoding
▸ ASCII, UTF-8, UCS2, latin1
▸ Common Encoding in Asia
▸ Big5, HKSCS, ShiftJIS, GBK
Slide 6
Slide 6 text
BINARY PROCESSING
TEXT ENCODING — ASCII
▸ First standardized encoding
▸ 0x20 (‘ ‘) ~ 0x7E (‘~’)
Slide 7
Slide 7 text
BINARY PROCESSING
TEXT ENCODING — UTF-8
▸ A modern Unicode standard
▸ Running length encoding
▸ Emoji!
▸ ASCII is subset of UTF-8
Slide 8
Slide 8 text
BINARY PROCESSING
TEXT ENCODING — BIG5 FAMILY
▸ Traditional Chinese characters
▸ Include CP950, Big5, HKSCS, they are different in Python
and iconv
▸ Considered obstacle, legacy encoding
Slide 9
Slide 9 text
BINARY PROCESSING
TEXT ENCODING — LATIN1
▸ 0 ~ 0xFF are used, so any bytes sequence can be stored in
this encoding (Useful technique in Python)
BINARY PROCESSING
INTEGER REPRESENTATION
▸ What is basic memory unit?
▸ Byte
▸ How does computer store integers?
▸ From lowest byte to highest byte called Little-Endian
▸ Ex: int32_t n = 0xaabbccdd;
▸ In memory: dd cc bb aa
▸ Let's try!
Slide 14
Slide 14 text
MEMORY MODEL
Slide 15
Slide 15 text
BINARY PROCESSING
MEMORY MODEL
▸ Address (integer) and Data (byte cells)
▸ Minimal addressing unit is Byte
▸ Use HxD / Cheat Engine / gdb to inspect process memory
▸ Use HxD / xxd / hexdump to inspect binary file
▸ Let’s try!
▸ Pointer: Actually it’s an integer, store address in data
BINARY PROCESSING
DATA STRUCTURE
BMP FILE
Practice:
Slide 18
Slide 18 text
BINARY PROCESSING
DATA STRUCTURE
GIF FILE
Practice:
Slide 19
Slide 19 text
BINARY PROCESSING
DATA STRUCTURE
PNG FILE
Practice:
Slide 20
Slide 20 text
BINARY PROCESSING
PRACTICE IN PYTHON - BASIC TYPES
STR BYTES
BYTEARRAY
str.encode("encoding")
bytes.decode("encoding")
b'rawbytedata'
'This is str'
bytearray(b'converting')
3 BASIC TYPES
Slide 21
Slide 21 text
BINARY PROCESSING
PRACTICE IN PYTHON - BASIC TYPES
s = 'Hello, Hacker'
b = s.encode('ascii')
a = bytearray(b)
print(type(s), type(b), type(a))
Slide 22
Slide 22 text
BINARY PROCESSING
PRACTICE IN PYTHON - FILE OPERATION
# generate a binary file
with open('out.dat', 'wb') as fout:
fout.write(bytes([
i for i in range(256)
]))
Slide 23
Slide 23 text
BINARY PROCESSING
PRACTICE IN PYTHON - FILE OPERATION
# read a binary file
with open('out.dat', 'rb') as fin:
data = fin.read()
print(data.hex()) # Py3.5
# Py3.4
# import binascii
# print(binascii.hexlify(data))
# Py2.7
# print(data.encode('hex'))
Slide 24
Slide 24 text
BINARY PROCESSING
PRACTICE IN PYTHON - XOR A FILE
content = open('data.bin', 'rb')
content = bytearray(content)
for i in range(len(content)):
content[i] ^= 0x9c
open('out.bin', 'wb').write(content)
Slide 25
Slide 25 text
BINARY PROCESSING
PRACTICE IN PYTHON - BASE_XX ENCODING
import base64
data = bytes.fromhex('61626364')
print(base64.b85encode(data))
print(base64.b64encode(data))
print(base64.b32encode(data))
print(data.hex())
# also b**decode(b'encoded data')
Slide 26
Slide 26 text
BINARY PROCESSING
PRACTICE IN PYTHON - XOR ENCRYPT AND BASE64 ENCODE
import base64
data = input('Data to be encrypt:')
data = bytearray(data.encode('utf-8'))
key = input('Key:').encode('utf-8')
len_key = len(key)
for i in range(len(data)):
data[i] ^= key[i % len_key]
data = base64.b64encode(data)
print(data.decode('ascii'))
Slide 27
Slide 27 text
BINARY PROCESSING
PRACTICE IN PYTHON - BASE64 DECODE AND XOR DECRYPT
import base64
data = input('Data to be decrypt:')
data = data.encode('ascii')
data = bytearray(base64.b64decode(data))
key = input('Key:').encode('utf-8')
len_key = len(key)
for i in range(len(data)):
data[i] ^= key[i % len_key]
data = bytes(data).decode('utf-8')
print(data)
Slide 28
Slide 28 text
BINARY PROCESSING
XOR TOOL
inndy@inndy-mac ~$ pip2 install xortool
Collecting xortool
Installing collected packages: xortool
Successfully installed xortool-0.95
inndy@inndy-mac ~$ xortool data -c ' '
The most probable key lengths:
1: 33.0%
19: 13.5%
21: 9.8%
23: 9.0%
25: 7.9%
28: 6.9%
32: 6.3%
36: 4.8%
38: 5.3%
40: 3.7%
Key-length can be 4*n
1 possible key(s) of length 1:
\xa5
Found 1 plaintexts with 95.0%+ printable characters
See files filename-key.csv, filename-char_used-perc_printable.csv
inndy@inndy-mac ~$ xortool data -c ' ' -l 1
1 possible key(s) of length 1:
\xa5
Found 1 plaintexts with 95.0%+ printable characters
See files filename-key.csv, filename-char_used-perc_printable.csv
inndy@inndy-mac ~$ cat xortool_out/0.out