Slide 1

Slide 1 text

Bits, Bytes and Characters Shaikhul Islam Chowdhury dev.to/shaikhul github.com/shaikhul

Slide 2

Slide 2 text

Bit ● Smallest unit of storage ● Bit is 0 or 1 ● 8 bits - 1 Byte

Slide 3

Slide 3 text

Byte ● Group of 8 bit ● 1 bit pattern - 0, 1 - 2 entry ● 2 bit pattern - 00, 01, 10, 11 - 4 entry ● n bit - 2^n entry possible ● 1 Byte ○ 8 bit - 2^8 - 255 entry ○ Can hold 0 - 255 numbers

Slide 4

Slide 4 text

Bytes ● How many bytes? ● All storage are measured in Bytes ● Bigger units ○ KB (1000 B), ○ MB (1000 KB), ○ GB (1000 MB), ○ TB (1000 GB) etc

Slide 5

Slide 5 text

Character and Unicode ● Characters are represented as code point - range 0 - 0x10FFFF ( 1 million) Character Unicode Code Point Glyph Latin small letter a 0x61 a Black chess knight 0x265E ♞ Euro currency 0x20AC €

Slide 6

Slide 6 text

Character and Unicode (Code Point) Python In [22]: chr(0x0041) Out[22]: 'A' In [23]: chr(0x00df) Out[23]: 'ß' In [24]: chr(0x6771) Out[24]: '東' In [25]: chr(0x10400) Out[25]: '' Java jshell> new String(Character.toChars(0x0041)) $13 ==> "A" jshell> new String(Character.toChars(0x00df)) $14 ==> "ß" jshell> new String(Character.toChars(0x6771)) $15 ==> "東" jshell> new String(Character.toChars(0x10400)) $16 ==> ""

Slide 7

Slide 7 text

(Character) Encoding ● Unicode string is a sequence of code points (limit 0 - 0x10FFFF) ● character encoding - translate sequence of code points into Bytes to store into memory ○ ASCII: 7 bit (0 - 127), english letters ○ UTF-8: most common, default in python ○ UTF-16 etc

Slide 8

Slide 8 text

(Character) Encoding - String to Bytes Python In [40]: c = chr(0x20ac) In [41]: c Out[41]: '€' In [42]: c.encode('utf-8') Out[42]: b'\xe2\x82\xac' Java jshell> String str = new String(Character.toChars(0x20ac)) str ==> "€" jshell> import java.nio.charset.* jshell> byte bytes[] = str.getBytes(StandardCharsets.UTF_8) bytes ==> byte[3] { -30, -126, -84 } jshell> for (byte b: bytes) { System.out.printf("%x ", b); } e2 82 ac

Slide 9

Slide 9 text

References ● Stanford CS 101 on Bits and Bytes ● Unicode HOWTO — Python 3.9.1 documentation ● Unicode (The Java™ Tutorials > Internationalization > Working with Text)

Slide 10

Slide 10 text

Thank You