$30 off During Our Annual Pro Sale. View Details »

Error Correction Over Noisy Channels

Error Correction Over Noisy Channels

A primer on Information Theory and its fundamental theorem which I gave in the spring of 2012 as part of MAT 490 at Berry College. There are a few significant typos in this presentation, so be warned.

Charles Julian Knight

March 20, 2012
Tweet

More Decks by Charles Julian Knight

Other Decks in Education

Transcript

  1. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Error Correction Over Noisy Channels
    and the Noisy Channel Coding Theorem
    Charles Julian Knight
    Berry College
    March 20, 2012

    View Slide

  2. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Introduction
    How do you...
    • Talk to space shuttles?
    • Avoid cross-talk over telephone lines?
    • Make a CD that still plays if you scratch it?

    View Slide

  3. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    History
    Claude Shannon
    “The Father of Information Theory”
    • 1916 - 2001
    • MIT
    • Differential Analyzer
    • Bell Labs
    • ENIGMA
    Papers
    • A Symbolic Analysis of Relay and Switching Circuits
    “Possibly the most important, and also the most famous,
    master’s thesis of the century.” -Howard Gardner
    • A Mathematical Theory of Communication
    http://www.bell-labs.com/news/2001/february/26/1.html Copyright Bell Labs

    View Slide

  4. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Definitions and Notation
    • Shannon bit – “The amount of information gained (or entropy
    removed) upon learning the answer to a question whose two
    possible answers were equally likely, a priori.”

    View Slide

  5. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Definitions and Notation
    • Shannon bit – “The amount of information gained (or entropy
    removed) upon learning the answer to a question whose two
    possible answers were equally likely, a priori.”
    • Target Space – {x1, x2, ..., xn}

    View Slide

  6. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Definitions and Notation
    • Shannon bit – “The amount of information gained (or entropy
    removed) upon learning the answer to a question whose two
    possible answers were equally likely, a priori.”
    • Target Space – {x1, x2, ..., xn}
    • P(X = x1) or simply P(x1)

    View Slide

  7. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Definitions and Notation
    • Shannon bit – “The amount of information gained (or entropy
    removed) upon learning the answer to a question whose two
    possible answers were equally likely, a priori.”
    • Target Space – {x1, x2, ..., xn}
    • P(X = x1) or simply P(x1)
    • Expected Value – E(X) =
    n
    i=1
    xiP(xi)

    View Slide

  8. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Bayes’ Theorem
    Conditional probability: P(X|Y )

    View Slide

  9. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Bayes’ Theorem
    Conditional probability: P(X|Y )
    P(U|V ) =
    P(V |U)P(U)
    P(V )

    View Slide

  10. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Bayes’ Theorem
    Conditional probability: P(X|Y )
    P(U|V ) =
    P(V |U)P(U)
    P(V )
    =
    P(UandV )
    P(V )

    View Slide

  11. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Shannon Entropy
    A measure of the amount of uncertainty in the value of a random
    variable, measured in Shannon bits.
    Example: Coin Toss

    View Slide

  12. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Shannon Entropy
    A measure of the amount of uncertainty in the value of a random
    variable, measured in Shannon bits.
    Example: Coin Toss
    Two notes:
    • Entropy in Thermodynamics

    View Slide

  13. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Shannon Entropy
    A measure of the amount of uncertainty in the value of a random
    variable, measured in Shannon bits.
    Example: Coin Toss
    Two notes:
    • Entropy in Thermodynamics
    • Shannon bits vs. digital bits

    View Slide

  14. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Entropy Function
    For random variable X with target space x1, x2, ..., xn
    H(X) =
    n
    i=0
    P(xi) log2
    1
    P(xi)

    View Slide

  15. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Entropy Function
    For random variable X with target space x1, x2, ..., xn
    H(X) =
    n
    i=0
    P(xi) log2
    1
    P(xi)
    or, if we pop a sign out of the log,
    = −
    n
    i=0
    P(xi) log2
    (P(xi))

    View Slide

  16. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Entropy Function
    For random variable X with target space x1, x2, ..., xn
    H(X) =
    n
    i=0
    P(xi) log2
    1
    P(xi)
    or, if we pop a sign out of the log,
    = −
    n
    i=0
    P(xi) log2
    (P(xi))
    compare this to statistical mechanics:
    S = −k
    i
    pi log(pi)

    View Slide

  17. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Weighted coin, 2
    3
    heads, 1
    3
    tails:
    H(heads) =
    2
    3
    log
    3
    2
    ≈ .39bits H(tails) =
    1
    3
    log (3) ≈ .53bits

    View Slide

  18. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Weighted coin, 2
    3
    heads, 1
    3
    tails:
    H(heads) =
    2
    3
    log
    3
    2
    ≈ .39bits H(tails) =
    1
    3
    log (3) ≈ .53bits
    H(cointoss) = H(heads)+H(tails) =
    2
    3
    log
    3
    2
    +
    1
    3
    log (3) ≈ .92bits

    View Slide

  19. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Weighted coin, 2
    3
    heads, 1
    3
    tails:
    H(heads) =
    2
    3
    log
    3
    2
    ≈ .39bits H(tails) =
    1
    3
    log (3) ≈ .53bits
    H(cointoss) = H(heads)+H(tails) =
    2
    3
    log
    3
    2
    +
    1
    3
    log (3) ≈ .92bits
    6-sided die
    1
    6
    log(6) +
    1
    6
    log(6) + ... +
    1
    6
    log(6) = log(6) ≈ 2.59bits

    View Slide

  20. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Weighted coin, 2
    3
    heads, 1
    3
    tails:
    H(heads) =
    2
    3
    log
    3
    2
    ≈ .39bits H(tails) =
    1
    3
    log (3) ≈ .53bits
    H(cointoss) = H(heads)+H(tails) =
    2
    3
    log
    3
    2
    +
    1
    3
    log (3) ≈ .92bits
    6-sided die
    1
    6
    log(6) +
    1
    6
    log(6) + ... +
    1
    6
    log(6) = log(6) ≈ 2.59bits
    Weighted 6-sided die {1
    2
    , 1
    10
    , 1
    10
    , 1
    10
    , 1
    10
    }
    1
    2
    log(2) +
    1
    2
    log(10)

    View Slide

  21. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Weighted coin, 2
    3
    heads, 1
    3
    tails:
    H(heads) =
    2
    3
    log
    3
    2
    ≈ .39bits H(tails) =
    1
    3
    log (3) ≈ .53bits
    H(cointoss) = H(heads)+H(tails) =
    2
    3
    log
    3
    2
    +
    1
    3
    log (3) ≈ .92bits
    6-sided die
    1
    6
    log(6) +
    1
    6
    log(6) + ... +
    1
    6
    log(6) = log(6) ≈ 2.59bits
    Weighted 6-sided die {1
    2
    , 1
    10
    , 1
    10
    , 1
    10
    , 1
    10
    }
    1
    2
    log(2) +
    1
    2
    log(10) ≈ 2.16bits

    View Slide

  22. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Binary Entropy Function
    http://en.wikipedia.org/wiki/File:Binary entropy plot.svg Creative Commons 3.0 BY-SA

    View Slide

  23. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Binomial Experiments
    A fixed number n trials of a system with two outcomes, with a
    probability of success p. Then the probability of k successes is
    P(k) =
    n
    k
    pk(1 − p)n−k

    View Slide

  24. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Binomial Experiments
    A fixed number n trials of a system with two outcomes, with a
    probability of success p. Then the probability of k successes is
    P(k) =
    n
    k
    pk(1 − p)n−k
    Binomial coefficient:
    n!
    k!(n − k)!

    View Slide

  25. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Noisy Channels
    • Channel – A medium (physical or logical) over which
    information is sent.

    View Slide

  26. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Noisy Channels
    • Channel – A medium (physical or logical) over which
    information is sent.
    • Noisy channel – Channel which has some probability (noise
    parameter f) that a bit will swap values (be “flipped”).

    View Slide

  27. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Noise Simulation
    By Randall Monroe, xkcd.com/171, Creative Commons 2.5 BY-NC

    View Slide

  28. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Hamming Distance
    Simply the amount of bits two messages differ by.
    X Y Hamming Distance
    0000 1011 3
    1000 1110 2
    11010 11010 0

    View Slide

  29. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Let’s send a message! Noise frequency f = 25%
    s t r s’

    View Slide

  30. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Let’s send a message! Noise frequency f = 25%
    s t r s’
    11010

    View Slide

  31. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Let’s send a message! Noise frequency f = 25%
    s t r s’
    11010
    11010
    11010
    11010

    View Slide

  32. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Let’s send a message! Noise frequency f = 25%
    s t r s’
    11010
    11010
    11010
    11010
    10010
    11011
    10000

    View Slide

  33. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Let’s send a message! Noise frequency f = 25%
    s t r s’
    11010
    11010
    11010
    11010
    10010
    11011
    10000
    10010

    View Slide

  34. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Let’s send a message! Noise frequency f = 25%
    s t r s’
    11010
    11010
    11010
    11010
    10010
    11011
    10000
    10010
    Hamming distance from s to s’ = 1. Error of 20%.
    How much, in general, do we reduce the error by?

    View Slide

  35. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Error Correction in General case:

    View Slide

  36. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Error Correction in General case:
    P(errorRN ) =
    N
    n=N+1
    2
    N
    n
    fn(1 − f)N−n
    This is the sum of Binomial probabilities of getting more than half
    the bits flipped.

    View Slide

  37. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Error Correction in General case:
    P(errorRN ) =
    N
    n=N+1
    2
    N
    n
    fn(1 − f)N−n
    This is the sum of Binomial probabilities of getting more than half
    the bits flipped. Error probabilities for f = .25:
    • P(errorR3) ≈ .156
    • P(errorR7) ≈ .071

    View Slide

  38. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Repetition Codes
    Error Correction in General case:
    P(errorRN ) =
    N
    n=N+1
    2
    N
    n
    fn(1 − f)N−n
    This is the sum of Binomial probabilities of getting more than half
    the bits flipped. Error probabilities for f = .25:
    • P(errorR3) ≈ .156
    • P(errorR7) ≈ .071
    Rate reduction. Is there a better way?

    View Slide

  39. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Joint Entropy and Mutual Information
    For random variables X = {x1, ..xn} and Y = {y1, ..., ym}:

    View Slide

  40. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Joint Entropy and Mutual Information
    For random variables X = {x1, ..xn} and Y = {y1, ..., ym}:
    • Joint Entropy – H(X, Y )

    View Slide

  41. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Joint Entropy and Mutual Information
    For random variables X = {x1, ..xn} and Y = {y1, ..., ym}:
    • Joint Entropy – H(X, Y ) =
    n
    i=1
    m
    j=1
    H(X = xiandY = yj)

    View Slide

  42. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Joint Entropy and Mutual Information
    For random variables X = {x1, ..xn} and Y = {y1, ..., ym}:
    • Joint Entropy – H(X, Y ) =
    n
    i=1
    m
    j=1
    H(X = xiandY = yj)
    • Mutual Information – I(X : Y )

    View Slide

  43. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Joint Entropy and Mutual Information
    For random variables X = {x1, ..xn} and Y = {y1, ..., ym}:
    • Joint Entropy – H(X, Y ) =
    n
    i=1
    m
    j=1
    H(X = xiandY = yj)
    • Mutual Information – I(X : Y ) = H(X) + H(Y ) − H(X, Y )

    View Slide

  44. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    −0log0?
    lim
    x→0+
    −xlog2(x)
    L’Hˆ
    opital’s Rule
    If:
    1. limx→c f(x) = limx→c = 0 or ±∞
    2. limx→c
    f (x)
    g (x)
    exists
    3. g (x) = 0
    Then:
    lim
    x→c
    f(x)
    g(x)
    = lim
    x→c
    f (x)
    g (x)

    View Slide

  45. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    −0log0?
    lim
    x→0+
    −xlog2(x)
    L’Hˆ
    opital’s Rule
    If:
    1. limx→c f(x) = limx→c = 0 or ±∞
    2. limx→c
    f (x)
    g (x)
    exists
    3. g (x) = 0
    Then:
    lim
    x→c
    f(x)
    g(x)
    = lim
    x→c
    f (x)
    g (x)
    • f(x) = log2(x) → f (x) = 1
    x
    • g(x) = −1
    x
    → g (x) = 1
    x2

    View Slide

  46. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    −0log0?
    lim
    x→0+
    −xlog2(x)
    L’Hˆ
    opital’s Rule
    If:
    1. limx→c f(x) = limx→c = 0 or ±∞
    2. limx→c
    f (x)
    g (x)
    exists
    3. g (x) = 0
    Then:
    lim
    x→c
    f(x)
    g(x)
    = lim
    x→c
    f (x)
    g (x)
    • f(x) = log2(x) → f (x) = 1
    x
    • g(x) = −1
    x
    → g (x) = 1
    x2
    • limx→0+
    f(x)
    g(x)
    = limx→0+
    f (x)
    g (x)
    = limx→0+
    x2
    x
    = limx→c x = 0

    View Slide

  47. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Take two coins, one fair, one two-headed. Choose one randomly
    (X) and flip it twice. Random variable Y corresponds to the
    number of heads. X = {0, 1} (fair, unfair), Y = {0, 1, 2} (1
    head, 2 heads, 3 heads)

    View Slide

  48. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Take two coins, one fair, one two-headed. Choose one randomly
    (X) and flip it twice. Random variable Y corresponds to the
    number of heads. X = {0, 1} (fair, unfair), Y = {0, 1, 2} (1
    head, 2 heads, 3 heads)
    X
    Unfair
    X=0
    H
    H
    F=2
    H
    F=2
    H
    H
    F=2
    H
    F=2
    Fair
    X=1
    H
    H
    F=2
    T
    F=1
    T
    H
    F=1
    T
    F=0

    View Slide

  49. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Take two coins, one fair, one two-headed. Choose one randomly
    (X) and flip it twice. Random variable Y corresponds to the
    number of heads. X = {0, 1} (fair, unfair), Y = {0, 1, 2} (1
    head, 2 heads, 3 heads)
    X
    Unfair
    X=0
    H
    H
    F=2
    H
    F=2
    H
    H
    F=2
    H
    F=2
    Fair
    X=1
    H
    H
    F=2
    T
    F=1
    T
    H
    F=1
    T
    F=0
    H(X) H(Y )

    View Slide

  50. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Take two coins, one fair, one two-headed. Choose one randomly
    (X) and flip it twice. Random variable Y corresponds to the
    number of heads. X = {0, 1} (fair, unfair), Y = {0, 1, 2} (1
    head, 2 heads, 3 heads)
    X
    Unfair
    X=0
    H
    H
    F=2
    H
    F=2
    H
    H
    F=2
    H
    F=2
    Fair
    X=1
    H
    H
    F=2
    T
    F=1
    T
    H
    F=1
    T
    F=0
    H(X) H(Y ) H(X, Y )

    View Slide

  51. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Example
    Take two coins, one fair, one two-headed. Choose one randomly
    (X) and flip it twice. Random variable Y corresponds to the
    number of heads. X = {0, 1} (fair, unfair), Y = {0, 1, 2} (1
    head, 2 heads, 3 heads)
    X
    Unfair
    X=0
    H
    H
    F=2
    H
    F=2
    H
    H
    F=2
    H
    F=2
    Fair
    X=1
    H
    H
    F=2
    T
    F=1
    T
    H
    F=1
    T
    F=0
    H(X) H(Y ) H(X, Y ) I(X : Y )

    View Slide

  52. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    As Related to Sets

    View Slide

  53. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f

    View Slide

  54. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f
    • Discrete – Messages can be divided into separate symbols.
    Furthermore, X and Y have finite target spaces.

    View Slide

  55. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f
    • Discrete – Messages can be divided into separate symbols.
    Furthermore, X and Y have finite target spaces.
    • Memoryless – Probabilities are independent and don’t change.

    View Slide

  56. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f
    • Discrete – Messages can be divided into separate symbols.
    Furthermore, X and Y have finite target spaces.
    • Memoryless – Probabilities are independent and don’t change.
    • Binary Symmetric Channel – Channel with binary input and
    binary output and error parameter f < .5

    View Slide

  57. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f
    • Discrete – Messages can be divided into separate symbols.
    Furthermore, X and Y have finite target spaces.
    • Memoryless – Probabilities are independent and don’t change.
    • Binary Symmetric Channel – Channel with binary input and
    binary output and error parameter f < .5
    • Redundancy – Extra information added to a message to
    reduce error.

    View Slide

  58. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f
    • Discrete – Messages can be divided into separate symbols.
    Furthermore, X and Y have finite target spaces.
    • Memoryless – Probabilities are independent and don’t change.
    • Binary Symmetric Channel – Channel with binary input and
    binary output and error parameter f < .5
    • Redundancy – Extra information added to a message to
    reduce error.
    • Capacity – Maximum concentration of information for a given
    channel.
    Γ = maxP(X)
    I(X : Y )

    View Slide

  59. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    More Definitions
    Assuming channel Λ is a discrete, memoryless, binary symmetric
    channel (BSC) with error parameter f
    • Discrete – Messages can be divided into separate symbols.
    Furthermore, X and Y have finite target spaces.
    • Memoryless – Probabilities are independent and don’t change.
    • Binary Symmetric Channel – Channel with binary input and
    binary output and error parameter f < .5
    • Redundancy – Extra information added to a message to
    reduce error.
    • Capacity – Maximum concentration of information for a given
    channel.
    Γ = maxP(X)
    I(X : Y )
    Simplifies to
    Γ = 1 − H(f)

    View Slide

  60. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Fundamental Theorem
    Goal: to minimize error probability, while also minimizing
    redundancy (and thereby maximize rate).

    View Slide

  61. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Fundamental Theorem
    Goal: to minimize error probability, while also minimizing
    redundancy (and thereby maximize rate).
    Let Λ be a BSC with parameter f < 1/2 and resulting capacity
    Γ = 1 − H(f). Let R be any information rate with R < Γ. Let
    > 0 be an arbitrarily small positive quantity. Then, there exists a
    code C of length N and a rate ≥ R and a decoding algorithm such
    that the maximum probability of error is ≤ .

    View Slide

  62. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Fundamental Theorem
    Goal: to minimize error probability, while also minimizing
    redundancy (and thereby maximize rate).
    Let Λ be a BSC with parameter f < 1/2 and resulting capacity
    Γ = 1 − H(f). Let R be any information rate with R < Γ. Let
    > 0 be an arbitrarily small positive quantity. Then, there exists a
    code C of length N and a rate ≥ R and a decoding algorithm such
    that the maximum probability of error is ≤ .
    • Baby analogy

    View Slide

  63. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Fundamental Theorem
    Goal: to minimize error probability, while also minimizing
    redundancy (and thereby maximize rate).
    Let Λ be a BSC with parameter f < 1/2 and resulting capacity
    Γ = 1 − H(f). Let R be any information rate with R < Γ. Let
    > 0 be an arbitrarily small positive quantity. Then, there exists a
    code C of length N and a rate ≥ R and a decoding algorithm such
    that the maximum probability of error is ≤ .
    • Baby analogy
    • R > Γ

    View Slide

  64. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Hamming Code and Parity
    • Richard Hamming

    View Slide

  65. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Hamming Code and Parity
    • Richard Hamming – (7,4)Hamming Code

    View Slide

  66. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Hamming Code and Parity
    • Richard Hamming – (7,4)Hamming Code
    • SECDED

    View Slide

  67. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Hamming Code and Parity
    • Richard Hamming – (7,4)Hamming Code
    • SECDED
    • General – (n,k)Hamming Codes: n message length, r parity
    bits, and k data bits, where n = 2r − 1,
    k = 2r − r − 1 = n − r, and r ≥ 2.

    View Slide

  68. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Hamming Code and Parity
    • Richard Hamming – (7,4)Hamming Code
    • SECDED
    • General – (n,k)Hamming Codes: n message length, r parity
    bits, and k data bits, where n = 2r − 1,
    k = 2r − r − 1 = n − r, and r ≥ 2.
    r n k rate = k
    n
    2 3 1 .3333
    3 7 4 .5714
    4 15 11 .7333
    5 31 26 .8387
    .
    .
    .
    .
    .
    .
    10 1023 1013 .9902
    .
    .
    .
    .
    .
    .
    16 65535 65519 .9996

    View Slide

  69. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Application 1 – QR Codes
    • Use Reed-Solomon error correction with 8 bit codewords.
    • Block errors
    Level Approx. codewords restored
    L 7%
    M 15%
    Q 32%
    H 30%

    View Slide

  70. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0

    View Slide

  71. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Application 2 – DNA
    • DNA as code: 4 base pairs: (A, C, G, T)
    • Quaternary system - easy to make Binary

    View Slide

  72. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Application 2 – DNA
    • DNA as code: 4 base pairs: (A, C, G, T)
    • Quaternary system - easy to make Binary
    • About 3.2 Billion base pairs: 6.4 Billion bits or about 800
    Megabytes. Testament?

    View Slide

  73. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Application 2 – DNA
    • DNA as code: 4 base pairs: (A, C, G, T)
    • Quaternary system - easy to make Binary
    • About 3.2 Billion base pairs: 6.4 Billion bits or about 800
    Megabytes. Testament?
    • Competing goals: prokaryotes vs. eukaryotes

    View Slide

  74. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Application 2 – DNA
    • DNA as code: 4 base pairs: (A, C, G, T)
    • Quaternary system - easy to make Binary
    • About 3.2 Billion base pairs: 6.4 Billion bits or about 800
    Megabytes. Testament?
    • Competing goals: prokaryotes vs. eukaryotes
    • Field of Bioinformatics

    View Slide

  75. 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0
    Sources
    Primary Sources
    1. Aiden A. Bruen and Mario A. Forcintino. Cryptography,
    Information Theory, and Error-Correction John Wiley Sons,
    2005.
    2. David J. C. MacKay. Information Theory, Inference, and
    Learning Algorithms Cambridge University Press, 2003.
    Papers
    1. Claude E. Shannon, Warren Weaver. The Mathematical
    Theory of Communication Univ of Illinois Press, 1949.
    2. Thomas A. Kunkel. DNA Replication Fidelity JBC Papers
    2004.

    View Slide