Slide 1

Slide 1 text

Block Palindromes A New Generalization of Palindromes SPIRE 2018 in LIMA Keisuke Goto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga

Slide 2

Slide 2 text

 Standard Palindromes Palindromes a b c b a Same string a b c d e b a Same string  Gapped Palindromes a b X b a X = cde gap SPIRE 2018 in LIMA 2/ 19

Slide 3

Slide 3 text

 Palindromes represent characteristic structures of strings. There are several research about properties of palindromes  maximal palindromes, palindrome factorization, ...  Gapped palindromes model hairpin structures of DNA and RNA sequences Why Palindromes? where, G = C and U = A gap https://en.wikipedia.org/wiki/Stem-loop SPIRE 2018 in LIMA 3/ 19

Slide 4

Slide 4 text

 A factorization f = f-n … f-1 f0 f1 … fn of a string T is a block palindrome if f-i = fi for all 0 ≦ i ≦ n * f0 may be empty string and f-i , fi for 0 < i ≦ n mustn’t Block Palindromes (BPs) f 2 f 1 f 0 f 1 f 2 BPs are generalization of standard and gapped palindromes Same string LIMAisn‘tMALI f0 f1 f2 f-1 f-2 We call a factor a block SPIRE 2018 in LIMA 4/ 19

Slide 5

Slide 5 text

 We study basic properties of BPs, introducing representatives of BPs:  Largest BPs (of a string)  Maximal BPs (in a string)  We propose an algorithm to enumerate all maximal BPs in a string T that runs in O(|T | + ||MBP(T )||) optimal time, where ||MBP(T )|| is the output size (i.e., the sum of # of factors in the outputs) Contributions SPIRE 2018 in LIMA 5/ 19

Slide 6

Slide 6 text

 For a string T of length N, there are O(2N/2) BPs of T  A unary string has 2N/2 BPs # of BPs of T a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a T = ・・・ 2N/2 a a a a a a a a a SPIRE 2018 in LIMA 6/ 19

Slide 7

Slide 7 text

 A string that is a (nonempty and proper) prefix and a suffix of T is called a border of T  The outmost block of BPs of T is a border of T  BPs can be obtained by stripping a border iteratively BPs and Borders o n i o n i n o n i o n T = o n i o n i n o n i o n o n i o n i n o n i o n o n i o n i n o n i o n o n i o n i n o n i o n The BPs of T SPIRE 2018 in LIMA 7/ 19

Slide 8

Slide 8 text

8/ 19 SPIRE 2018 in LIMA The largest BP of T o n i o n i n o n i o n T = o n i o n i n o n i o n o n i o n i n o n i o n o n i o n i n o n i o n o n i o n i n o n i o n The BPs of T largest # of blocks Properties  The largest BP is unique (obtained by stripping the shortest border iteratively)  Each block is an unbordered string  Any BP is represented by a factorization of the largest BP

Slide 9

Slide 9 text

Factorization of Largest BPs  Let f = f-n , ..., fn , g = g-m , ..., gm be BPs with f the largest  In inductive steps on n > 0, we have 3 cases:  (1) | fn | = |gm |  (2) | fn | > |gm |  (3) | fn | < |gm | SPIRE 2018 in LIMA 9/ 19

Slide 10

Slide 10 text

g-m+1 , ..., gm-1 Factorization of Largest BPs  Let f = f-n , ..., fn , g = g-m , ..., gm be BPs with f the largest  In inductive steps on n > 0, we have 3 cases:  (1) | fn | = |gm |  (2) | fn | > |gm |  (3) | fn | < |gm | f-n+1 , ..., fn-1 f-n fn gm g-m By the inductive hypothesis, f-n+1 , ..., fn-1 is finer than g-m+1 , ..., gm-1 , which implies that f is finer than g SPIRE 2018 in LIMA 10/ 19

Slide 11

Slide 11 text

Factorization of Largest BPs  Let f = f-n , ..., fn , g = g-m , ..., gm be BPs with f the largest  In inductive steps on n > 0, we have 3 cases:  (1) | fn | = |gm |  (2) | fn | > |gm | (cannot happen)  (3) | fn | < |gm | f-n+1 , ..., fn-1 f-n gm g-m f-n g-m , s, gm , f-n+1 , ..., fn-1 , gm , s, gm has larger number of factors than f, a contradiction f-n+1 , ..., fn-1 s s gm g-m gm gm SPIRE 2018 in LIMA 11/ 19

Slide 12

Slide 12 text

SPIRE 2018 in LIMA g-m+1 , ..., gm-1 Factorization of Largest BPs  Let f = f-n , ..., fn , g = g-m , ..., gm be BPs with f the largest  In inductive steps on n > 0, we have 3 cases:  (1) | fn | = |gm |  (2) | fn | > |gm | (cannot happen)  (3) | fn | < |gm | f-n+1 , ..., fn-1 g-m gm f-n fn By the inductive hypothesis, f-n+1 , ..., fn-1 is finer than s, fn , g-m+1 , ..., gm-1 , fn , s, which implies that f is finer than g g-m+1 , ..., gm-1 s s fn fn 12/ 19

Slide 13

Slide 13 text

 We study largest BPs that occur as substrings in T Largest BPs in T When we fix a center block (that should be an unbordered string), there could be many largest BPs expanded from it T SPIRE 2018 in LIMA 13/ 19

Slide 14

Slide 14 text

 We choose the maximal one as a representative of them Maximal BPs When we fix a center block (that should be an unbordered string), there could be many largest BPs expanded from it T maximal BP not maximal BP not maximal BP SPIRE 2018 in LIMA 14/ 19

Slide 15

Slide 15 text

 We choose the maximal one as a representative of them Maximal BPs T  Maximal BP whose f0 occurs at T[b ... e] is unique  Any largest BP is represented by a substring of the maximal BP  ||MBP(T)|| ≦ N(2N-1) Properties The sum of # of factors of maximal BPs in T maximal BP not maximal BP not maximal BP SPIRE 2018 in LIMA 15/ 19

Slide 16

Slide 16 text

 A naïve approach would be to compute the maximal BP for every center block  Using a data structure for constant-time longest common extension queries, it can be done O(N3) time in total Enumeration of Maximal BPs We propose an algorithm running in O(N + ||MBP(T)||) time, which is optimal unless the maximal BPs can be represented more compactly SPIRE 2018 in LIMA 16/ 19

Slide 17

Slide 17 text

 Our algorithm consists of two steps:  Enumerate all pairs of occurrences of unbordered strings and sort them by the center position and beginning positions of the right arms Enumeration of Maximal BPs T SPIRE 2018 in LIMA 17/ 19

Slide 18

Slide 18 text

 Our algorithm consists of two steps:  Enumerate all pairs of occurrences of unbordered strings and sort them by the center position and beginning positions of the right arms  For each center position, build maximal BPs by concatenating the enumerated pairs that are adjacent Enumeration of Maximal BPs T SPIRE 2018 in LIMA 18/ 19

Slide 19

Slide 19 text

 We define block palindromes (BPs) which are new generalization of standard and gapped palindromes  We introduce representatives of BPs  Largest BPs (of strings)  Maximal BPs (in strings)  We study basic properties of these representatives and give efficient algorithms to compute/enumerate them  Open problems:  Is it possible to represent the maximal BPs compactly? (what if we only consider BPs with empty center blocks?)  Can we utilize it to design faster enumeration algorithms? Conclusions and Future Work SPIRE 2018 in LIMA 19/ 19