Data Compression - Speaker Deck

Slide 1

Slide 1 text

DATA COMPRESSION Lossy vs Lossless

Slide 2

Slide 2 text

Data Compression ■ Data Compression shrinks down a file so that it takes up less space. This is desirable for data storage and data communication. Storage space on disks is expensive so a file which occupies less disk space is "cheaper" than an uncompressed file. ■ Smaller files are also desirable for data communication, because the smaller a file the faster it can be transferred. A compressed file appears to increase the speed of data transfer over an uncompressed file.

Slide 3

Slide 3 text

Data Compression Defined ■ All data is encoded. ■ This means that the data is originally a combination of elements, e, from some alphabet, A. ■ This combination of elements is a message, M. ■ This message from the alphabet, A, is encoded into the binary alphabet, B. ■ The string of bits, binary digits (0's and 1's), is the encoded data. ■ So essentially encoding is just transferring a message, M, from the alphabet A into the alphabet B.

Slide 4

Slide 4 text

Data Compression Defined ■ Here is an example: – The message is: a b c d – The encoded message is: 00 01 10 11 – The above example just translates the elements a,b,c,d from the english alphabet, A, into the binary alphabet, B. These elements can also be decoded from the binary alphabet back into the original message which was in the english alphabet.

Slide 5

Slide 5 text

Types of Data Compression ■ There are two main types of data compression: lossy and lossless. ■ Lossy data compression is named for what it does. After lossy data compression is applied to a message, the message can never be recovered exactly as it was before it was compressed. ■ When the compressed message is decoded it does not give back the original message. Data has been lost. ■ Because lossy compression can not be decoded to yield the exact original message, it is not a good method of compression for critical data, such as textual data. It is most useful for Digitally Sampled Analog Data (DSAD). DSAD consists mostly of sound, video, graphics, or picture files.

Slide 6

Slide 6 text

Lossy compression ■ Algorithms for lossy compression of DSAD vary, but many use a threshold level truncation. This means that a level is chosen past which all data is truncated. In a sound file, for example, the very high and low frequencies, which the human ear can not hear, may be truncated from the file. ■ Some examples of lossy data compression algorithms are JPEG and MPEG.

Slide 7

Slide 7 text

Lossless Compression ■ Lossless data compression is also named for what it does. In a lossless data compression file the original message can be exactly decoded. ■ Lossless data compression works by finding repeated patterns in a message and encoding those patterns in an efficient manner. For this reason, lossless data compression is also referred to as redundancy reduction. ■ Because redundancy reduction is dependent on patterns in the message, it does not work well on random messages. Lossless data compression is ideal for text.

Slide 8

Slide 8 text

Lossless Compression ■ One type of text encoding which is very effective for files with long strings of repeating bits is RLE. RLE stands for Run Length Encoding. RLE uses a sliding dictionary method. The sliding dictionary method utilises pointers within the compressed file that point to previously represented strings of bits within the file. ■ Here is an example of a message which could be effectively encoded with RLE: The rain in Spain falls mainly on the plain. ■ The string "ain" could be represented only once and could be pointed to by all later calls to that string.

Slide 9

Slide 9 text

Lossless Compression - Huffman coding ■ Huffman coding works by analysing the frequency, F, of elements, e, in a message, M. ■ The elements with the highest frequency, F:e, get assigned the shortest encoding (with the fewest bits). Elements with lower frequencies get assigned longer encodings (with more bits).