Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bits, Bytes and Characters

Bits, Bytes and Characters

Computer Science 101, what is bit, bytes, character and unicode

Avatar for Shaikhul Islam

Shaikhul Islam

January 29, 2021
Tweet

Other Decks in Education

Transcript

  1. Byte • Group of 8 bit • 1 bit pattern

    - 0, 1 - 2 entry • 2 bit pattern - 00, 01, 10, 11 - 4 entry • n bit - 2^n entry possible • 1 Byte ◦ 8 bit - 2^8 - 255 entry ◦ Can hold 0 - 255 numbers
  2. Bytes • How many bytes? • All storage are measured

    in Bytes • Bigger units ◦ KB (1000 B), ◦ MB (1000 KB), ◦ GB (1000 MB), ◦ TB (1000 GB) etc
  3. Character and Unicode • Characters are represented as code point

    - range 0 - 0x10FFFF ( 1 million) Character Unicode Code Point Glyph Latin small letter a 0x61 a Black chess knight 0x265E ♞ Euro currency 0x20AC €
  4. Character and Unicode (Code Point) Python In [22]: chr(0x0041) Out[22]:

    'A' In [23]: chr(0x00df) Out[23]: 'ß' In [24]: chr(0x6771) Out[24]: '東' In [25]: chr(0x10400) Out[25]: '' Java jshell> new String(Character.toChars(0x0041)) $13 ==> "A" jshell> new String(Character.toChars(0x00df)) $14 ==> "ß" jshell> new String(Character.toChars(0x6771)) $15 ==> "東" jshell> new String(Character.toChars(0x10400)) $16 ==> ""
  5. (Character) Encoding • Unicode string is a sequence of code

    points (limit 0 - 0x10FFFF) • character encoding - translate sequence of code points into Bytes to store into memory ◦ ASCII: 7 bit (0 - 127), english letters ◦ UTF-8: most common, default in python ◦ UTF-16 etc
  6. (Character) Encoding - String to Bytes Python In [40]: c

    = chr(0x20ac) In [41]: c Out[41]: '€' In [42]: c.encode('utf-8') Out[42]: b'\xe2\x82\xac' Java jshell> String str = new String(Character.toChars(0x20ac)) str ==> "€" jshell> import java.nio.charset.* jshell> byte bytes[] = str.getBytes(StandardCharsets.UTF_8) bytes ==> byte[3] { -30, -126, -84 } jshell> for (byte b: bytes) { System.out.printf("%x ", b); } e2 82 ac
  7. References • Stanford CS 101 on Bits and Bytes •

    Unicode HOWTO — Python 3.9.1 documentation • Unicode (The Java™ Tutorials > Internationalization > Working with Text)