Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Compression

AllenHeard
January 11, 2017

Data Compression

Year 13 Lesson

AllenHeard

January 11, 2017
Tweet

More Decks by AllenHeard

Other Decks in Education

Transcript

  1. Data Compression ▪ Data Compression shrinks down a file so

    that it takes up less space. This is desirable for data storage and data communication. Storage space on disks is expensive so a file which occupies less disk space is "cheaper" than an uncompressed file. ▪ Smaller files are also desirable for data communication, because the smaller a file the faster it can be transferred. A compressed file appears to increase the speed of data transfer over an uncompressed file.
  2. Data Compression Defined ▪ All data is encoded. ▪ This

    means that the data is originally a combination of elements, e, from some alphabet, A. ▪ This combination of elements is a message, M. ▪ This message from the alphabet, A, is encoded into the binary alphabet, B. ▪ The string of bits, binary digits (0's and 1's), is the encoded data. ▪ So essentially encoding is just transferring a message, M, from the alphabet A into the alphabet B.
  3. Data Compression Defined ▪ Here is an example: – The

    message is: a b c d – The encoded message is: 00 01 10 11 – The above example just translates the elements a,b,c,d from the english alphabet, A, into the binary alphabet, B. These elements can also be decoded from the binary alphabet back into the original message which was in the english alphabet.
  4. Types of Data Compression ▪ There are two main types

    of data compression: lossy and lossless. ▪ Lossy data compression is named for what it does. After lossy data compression is applied to a message, the message can never be recovered exactly as it was before it was compressed. ▪ When the compressed message is decoded it does not give back the original message. Data has been lost. ▪ Because lossy compression can not be decoded to yield the exact original message, it is not a good method of compression for critical data, such as textual data. It is most useful for Digitally Sampled Analog Data (DSAD). DSAD consists mostly of sound, video, graphics, or picture files.
  5. Lossy compression ▪ Algorithms for lossy compression of DSAD vary,

    but many use a threshold level truncation. This means that a level is chosen past which all data is truncated. In a sound file, for example, the very high and low frequencies, which the human ear can not hear, may be truncated from the file. ▪ Some examples of lossy data compression algorithms are JPEG and MPEG.
  6. Lossless Compression ▪ Lossless data compression is also named for

    what it does. In a lossless data compression file the original message can be exactly decoded. ▪ Lossless data compression works by finding repeated patterns in a message and encoding those patterns in an efficient manner. For this reason, lossless data compression is also referred to as redundancy reduction. ▪ Because redundancy reduction is dependent on patterns in the message, it does not work well on random messages. Lossless data compression is ideal for text.
  7. Lossless Compression ▪ One type of text encoding which is

    very effective for files with long strings of repeating bits is RLE. RLE stands for Run Length Encoding. RLE uses a sliding dictionary method. The sliding dictionary method utilises pointers within the compressed file that point to previously represented strings of bits within the file. ▪ Here is an example of a message which could be effectively encoded with RLE: The rain in Spain falls mainly on the plain. ▪ The string "ain" could be represented only once and could be pointed to by all later calls to that string.
  8. Lossless Compression - Huffman coding ▪ Huffman coding works by

    analysing the frequency, F, of elements, e, in a message, M. ▪ The elements with the highest frequency, F:e, get assigned the shortest encoding (with the fewest bits). Elements with lower frequencies get assigned longer encodings (with more bits).