Finding Synchronization Codes to Boost Compression by Substring Enumeration

Finding Synchronization Codes to Boost Compression by Substring Enumeration Dany
Vohl Claude-Guy Quimper Danny Dubé Dany Vohl, Synchronisation Codes to Boost Compression by Substring Enumeration 1/73

Dany Vohl, Synchronization Codes to Boost Compression by Substring Enumeration
Introduction (1) • Synchronization codes frequently used in numerical data transmission & storage • i.e. When data reception is ill-synchronized • Recent work on data compression gives synchronization codes a new and unusual purpose • This work aims to find synchronization codes 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 2/73

Structure of presentation 1. An application of synchronization codes 2. The new application : CSE 3. Characteristics of such codes 4. Constraint model 5. Pseudo-Boolean model 6. Experimental results 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 5/73

Dany Vohl, Synchronisation Codes to Boost Compression by Substring Enumeration
1. An application of synchronization codes Hard drive 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 6/73

Overview Hard drive Spinning disk Read/Write head (RW) 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 7/73

Overview Hard drive Where does a track (or sector, or byte) start? 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 8/73

2. Compression by Substring Enumeration : A Brief Description 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 9/73

• CSE compresses by : • transmitting the number of occurrences of every possible substrings of bits. • Compression by Substring Enumeration : A Brief Description 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 10/73

Compression by Substring Enumeration : A Brief Description 0 1 0 0 0 0 0 1 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 11/73

Compression by Substring Enumeration : A Brief Description • Problem : • CSE is bit oriented while benchmarks are byte oriented • Unaware of phase of the bits within the byte 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 16/73

Compression by Substring Enumeration : A Brief Description • Solution suggested (Dubé, ISITA, 2010) : • Add control bits before compression to achieve strong synchronization • Substrings inside byte boundaries • Side effects : • Pre-compressed file is larger than original file • But is highly compressible 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 17/73

Compression by Substring Enumeration : A Brief Description • Solution suggested (Dubé, ISITA, 2010) : • Add control bits before compression to acheive strong synchronization • Substrings inside byte boundaries • Side effects : • Pre-compressed file is larger than original file • But is highly compressible 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 18/73

3. Characteristics of our synchronization codes Phase, Synchronization and Reliability 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 19/73

Phase, Synchronization and Reliability Given this binary word: 0 1 0 1 0 1 0 1 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 20/73

Phase, Synchronization and Reliability Given this binary word: 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 With bit's phase: 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 21/73

Phase, Synchronization and Reliability ? ? ? ? ? ? ? ? 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 If one lands on a random bit inside the word, would it be possible to determine the phase of this bit? Let say one reads a “1” … … 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 22/73

Phase, Synchronization and Reliability This is where synchronization codes come handy 0 1 0 1 0 1 0 1 Original block: _ _ _ _ _ _ 0 _ _ 0 1 1 1 Synchronization code: Synchronized block: 0 1 0 1 0 1 0 0 1 0 1 1 1 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 23/73

Phase, Synchronization and Reliability Sequence of several synchronized blocks 0 1 0 1 0 1 0 1 _ _ _ _ _ _ 0 _ _ 0 1 1 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12 1 1 1 0 0 0 1 1 _ _ _ _ _ _ 0 _ _ 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results … … … … … … 24/73

Phase, Synchronization and Reliability Given this (d,k,n)-synchronization code taken from the alphabet {0, 1, _ } 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 25/73

Phase, Synchronization and Reliability Given this (d,k,n)-synchronization code taken from the alphabet {0, 1, _ } Where d is # of data bits here, d=8 k is # of control bits here, k=20 n is the reliability here, n=7 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 26/73

Phase, Synchronization and Reliability Given this (d,k,n)-synchronization code taken from the alphabet {0, 1, _ } Where d is # of data bits here, d=8 k is # of control bits here, k=20 n is the reliability here, n=7 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 27/73

Phase, Synchronization and Reliability Given this (d,k,n)-synchronization code taken from the alphabet {0, 1, _ } We obtain a (8,20,7)-synchronization code Where d is # of data bits here, d=8 k is # of control bits here, k=20 n is the reliability here, n=7 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 28/73

Phase, Synchronization and Reliability (8,20,7)-synchronization code : 7 reliable Synchronization code: Unknow phase in synchronized data: ... ... 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1 1 1 0 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 0 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 29/73

Phase, Synchronization and Reliability (8,20,7)-synchronization code : 7 reliable Where did we start in the synchronization code? ... ... 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1 1 1 0 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 0 bit(s) read: 1 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 30/73

Phase, Synchronization and Reliability (8,20,7)-synchronization code : 7 reliable Where did we start in the synchronization code? bit(s) read: 2 ... ... 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1 1 1 0 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 0 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 31/73

Phase, Synchronization and Reliability (8,20,7)-synchronization code : 7 reliable Where did we start in the synchronization code? bit(s) read: 4 ... ... 1 1 1 0 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 0 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 33/73

Phase, Synchronization and Reliability (8,20,7)-synchronization code : 7 reliable Where did we start in the synchronization code? bit(s) read: 6 ... ... 1 1 1 0 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 0 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 1 1 0 1 _ 1 _ 1 1 0 0 _ 0 _ 0 1 0 0 _ 0 _ 1 1 0 0 _ 1 _ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 35/73

The considered (d,k,n)-Synchronization Codes 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 39/73

The considered (d,k,n)-Synchronization Codes (8,10,9)-sync. code 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 40/73

The considered (d,k,n)-Synchronization Codes (8,10,9)-sync. code The phases range in {0, .., d+k-1} 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 41/73

Its rotations to the left The considered (d,k,n)-Synchronization Codes (8,10,9)-sync. code 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 42/73

The considered (d,k,n)-Synchronization Codes (8,10,9)-sync. code 9-reliability ensures two control bits conflict for any 2 lines in the first 9 columns 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 45/73

The considered (d,k,n)-Synchronization Codes (8,10,9)-sync. code 9-reliability ensures two control bits conflict for any 2 lines in the first 9 columns 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 46/73

9-reliability ensures two control bits conflict for any 2 lines in the first 9 columns The considered (d,k,n)-Synchronization Codes (8,10,9)-sync. code 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 47/73

4. Finding Synchronization Codes : A Constraint Model 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 48/73

Finding Synchronization Codes : A Constraint Model Variables • 1st model is built around 2k variables : • The position P of the control bit and • Its value V in a sequence C 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 49/73

Finding Synchronization Codes : A Constraint Model Dany Vohl, Synchronisation
Codes to Boost Compression by Substring Enumeration • Given 2 control bits in sequence C • Let A and B be these 2 control bits 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results A B 50/73

Finding Synchronization Codes : A Constraint Model • Given 2 control bits in sequence C • Let A and B be these 2 control bits 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results A B B 51/73

Codes to Boost Compression by Substring Enumeration • and are the position of bits A and B in seq. C • At rotation i and j 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 52/73

Codes to Boost Compression by Substring Enumeration • and are the position of bits A and B inside lines i and j 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 53/73

• Similarly, we have the values of A and B
inside lines i and j Finding Synchronization Codes : A Constraint Model Dany Vohl, Synchronisation Codes to Boost Compression by Substring Enumeration 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 54/73

5. Finding Synchronization Codes : A Pseudo-Boolean Model 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 55/73

Finding Synchronization Codes : A Pseudo-Boolean Model • 2 binary variables for d+k characters in C : • : is the i th bit in C a control bit? If so, • indicates if it is a 0 or a 1 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 56/73

Finding Synchronization Codes : A Pseudo-Boolean Model 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 57/73

Finding Synchronization Codes : A Pseudo-Boolean Model • 3rd variable: • When true (=1), the control bits i and g are 2 distinct bits at same (rotated) position with different values 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 58/73

Finding Synchronization Codes : A Pseudo-Boolean Model 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 59/73

Finding Synchronization Codes : A Pseudo-Boolean Model False 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 60/73

Finding Synchronization Codes : A Pseudo-Boolean Model • Finally, we ensure that the sum of all is greater or equal to 1 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 61/73

6. Finding Synchronization Codes : Experimental results 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 62/73

Finding Synchronization Codes : Experimental results • Experiments executed in 2 parts • 1st : optimal k for cases where d = n • 2nd : smallest k for cases where d ≠ n 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 63/73

Finding Synchronization Codes : Experimental results • The combination (PB model, minisat+) is faster than (CP model, Gecode) • (8,15,8)-Sync. Code : – CP (Gecode) : 5 min 48 sec – satisfiable – PB (Minisat+) : 0.31 sec – satisfiable • (8,14,8)-Sync. Code : – CP (Gecode) : 1 month – ??? – PB (Minisat+) : 3.77 sec – unsatisfiable 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 64/73

Finding Synchronization Codes : Experimental results • The combination (PB model, minisat+) is faster than (CP model, Gecode) • (8,15,8)-Sync. Code : – CP (Gecode) : 5 min 48 sec – satisfiable – PB (Minisat+) : 0.31 sec – satisfiable • (8,14,8)-Sync. Code : – CP (Gecode) : 1 month – ??? – PB (Minisat+) : 3.77 sec – unsatisfiable 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 65/73

Finding Synchronization Codes : Experimental results PB results for d = n Unsatisfiable Satisfiable 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 66/73

Finding Synchronization Codes : Experimental results PB results for d = n Unsatisfiable Satisfiable 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 67/73

Finding Synchronization Codes : Experimental results PB results for d ≠ n : no such code Blank : no answer found within 18000 seconds integer : smallest value of k 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 68/73

Conclusion 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 69/73

Conclusion • New application of synchronization codes : • Compression by Substring Enumeration • Characteristics of synchronization codes • d data bits • k control bits • n reliable : – Maximum number of bits read before synchronization 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 70/73

Conclusion • 2 Models • CP model : O(k2+d2) variables and constraints – Domains : max(n,k) • PB model : O(k2+d2) variables and constraints – All domains have only 2 values (binary) 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 71/73

Conclusion • Found synchronization codes for words up to 64 bits when d = n • Found the minimal number of control bits when d ≠ n 1. Overview : 1st application 2. A 2nd application : CSE 3. Characteristics 4. Constraint Model 5. Pseudo-Boolean model 6. Experimental results 72/73

Questions? 73/73

Finding Synchronization Codes to Boost Compress...

Finding Synchronization Codes to Boost Compression by Substring Enumeration

More Decks by Dany Vohl

Other Decks in Science

Featured

Transcript