kgoto
October 11, 2018
190

# 2018-SPIRE-Block-Palindromes

We propose a new generalization of palindromes and gapped palindromes called block palindromes. A block palindrome is a string becomes a palindrome when identical substrings are replaced with a distinct character. We investigate several properties of block palindromes and in particular, study substrings of a string which are block palindromes.

October 11, 2018

## Transcript

1. Block Palindromes
A New Generalization of Palindromes
SPIRE 2018 in LIMA
Keisuke Goto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga

2.  Standard Palindromes
Palindromes
a b c b a Same string
a b c d e b a
Same string
 Gapped Palindromes
a b X b a
X = cde
gap
SPIRE 2018 in LIMA 2/ 19

3.  Palindromes represent characteristic structures of strings.
There are several research about properties of palindromes

maximal palindromes, palindrome factorization, ...
 Gapped palindromes model hairpin structures of DNA and
RNA sequences
Why Palindromes?
where, G = C and U = A
gap
https://en.wikipedia.org/wiki/Stem-loop
SPIRE 2018 in LIMA 3/ 19

4.  A factorization f = f-n
… f-1
f0
f1
… fn
of a string T is a
block palindrome if f-i
= fi
for all 0 ≦ i ≦ n
* f0
may be empty string and f-i
, fi
for 0 < i ≦ n mustn’t
Block Palindromes (BPs)
f 2
f 1
f 0
f 1
f 2
BPs are generalization of standard and gapped palindromes
Same string
LIMAisn‘tMALI
f0
f1
f2
f-1
f-2
We call a
factor a block
SPIRE 2018 in LIMA 4/ 19

5.  We study basic properties of BPs, introducing
representatives of BPs:

Largest BPs (of a string)

Maximal BPs (in a string)
 We propose an algorithm to enumerate all maximal BPs in
a string T that runs in O(|T | + ||MBP(T )||) optimal time,
where ||MBP(T )|| is the output size (i.e., the sum of # of
factors in the outputs)
Contributions
SPIRE 2018 in LIMA 5/ 19

6.  For a string T of length N, there are O(2N/2) BPs of T
 A unary string has 2N/2 BPs
# of BPs of T
a a a a a a a a a
a a a a a a a a a
a a a a a a a a a
a a a a a a a a a
a a a a a a a a a
T =
・・・
2N/2
a a a a a a a a a
SPIRE 2018 in LIMA 6/ 19

7.  A string that is a (nonempty and proper)
prefix and a suffix of T is called a border of T
 The outmost block of BPs of T is a border of T
 BPs can be obtained by stripping a border iteratively
BPs and Borders
o n i o n i n o n i o n
T =
o n i o n i n o n i o n
o n i o n i n o n i o n
o n i o n i n o n i o n
o n i o n i n o n i o n
The BPs of T
SPIRE 2018 in LIMA 7/ 19

8. 8/ 19
SPIRE 2018 in LIMA
The largest BP of T
o n i o n i n o n i o n
T =
o n i o n i n o n i o n
o n i o n i n o n i o n
o n i o n i n o n i o n
o n i o n i n o n i o n
The BPs of T
largest # of
blocks
Properties
 The largest BP is unique
(obtained by stripping the shortest border iteratively)
 Each block is an unbordered string
 Any BP is represented by a factorization of the largest BP

9. Factorization of Largest BPs
 Let f = f-n
, ..., fn
, g = g-m
, ..., gm
be BPs with f the largest
 In inductive steps on n > 0, we have 3 cases:

(1) | fn
| = |gm
|

(2) | fn
| > |gm
|

(3) | fn
| < |gm
|
SPIRE 2018 in LIMA 9/ 19

10. g-m+1
, ..., gm-1
Factorization of Largest BPs
 Let f = f-n
, ..., fn
, g = g-m
, ..., gm
be BPs with f the largest
 In inductive steps on n > 0, we have 3 cases:

(1) | fn
| = |gm
|

(2) | fn
| > |gm
|

(3) | fn
| < |gm
|
f-n+1
, ..., fn-1
f-n
fn
gm
g-m
By the inductive hypothesis,
f-n+1
, ..., fn-1
is finer than g-m+1
, ..., gm-1
,
which implies that f is finer than g
SPIRE 2018 in LIMA 10/ 19

11. Factorization of Largest BPs
 Let f = f-n
, ..., fn
, g = g-m
, ..., gm
be BPs with f the largest
 In inductive steps on n > 0, we have 3 cases:

(1) | fn
| = |gm
|

(2) | fn
| > |gm
| (cannot happen)

(3) | fn
| < |gm
|
f-n+1
, ..., fn-1
f-n
gm
g-m
f-n
g-m
, s, gm
, f-n+1
, ..., fn-1
, gm
, s, gm
has
larger number of factors than f, a contradiction
f-n+1
, ..., fn-1
s s gm
g-m
gm
gm
SPIRE 2018 in LIMA 11/ 19

12. SPIRE 2018 in LIMA
g-m+1
, ..., gm-1
Factorization of Largest BPs
 Let f = f-n
, ..., fn
, g = g-m
, ..., gm
be BPs with f the largest
 In inductive steps on n > 0, we have 3 cases:

(1) | fn
| = |gm
|

(2) | fn
| > |gm
| (cannot happen)

(3) | fn
| < |gm
|
f-n+1
, ..., fn-1
g-m
gm
f-n
fn
By the inductive hypothesis,
f-n+1
, ..., fn-1
is finer than s, fn
, g-m+1
, ..., gm-1
, fn
, s,
which implies that f is finer than g
g-m+1
, ..., gm-1
s s
fn
fn
12/ 19

13.  We study largest BPs that occur as substrings in T
Largest BPs in T
When we fix a center block (that should be an unbordered
string), there could be many largest BPs expanded from it
T
SPIRE 2018 in LIMA 13/ 19

14.  We choose the maximal one as a representative of them
Maximal BPs
When we fix a center block (that should be an unbordered
string), there could be many largest BPs expanded from it
T
maximal BP
not maximal BP
not maximal BP
SPIRE 2018 in LIMA 14/ 19

15.  We choose the maximal one as a representative of them
Maximal BPs
T
 Maximal BP whose f0
occurs at T[b ... e] is unique
 Any largest BP is represented by a substring of the maximal BP
 ||MBP(T)|| ≦ N(2N-1)
Properties
The sum of # of factors of maximal BPs in T
maximal BP
not maximal BP
not maximal BP
SPIRE 2018 in LIMA 15/ 19

16.  A naïve approach would be to compute the maximal BP for
every center block
 Using a data structure for constant-time longest common
extension queries, it can be done O(N3) time in total
Enumeration of Maximal BPs
We propose an algorithm running
in O(N + ||MBP(T)||) time, which is optimal unless
the maximal BPs can be represented more compactly
SPIRE 2018 in LIMA 16/ 19

17.  Our algorithm consists of two steps:

Enumerate all pairs of occurrences of unbordered strings
and sort them by the center position and beginning
positions of the right arms
Enumeration of Maximal BPs
T
SPIRE 2018 in LIMA 17/ 19

18.  Our algorithm consists of two steps:

Enumerate all pairs of occurrences of unbordered strings
and sort them by the center position and beginning
positions of the right arms

For each center position, build maximal BPs by
concatenating the enumerated pairs that are adjacent
Enumeration of Maximal BPs
T
SPIRE 2018 in LIMA 18/ 19

19.  We define block palindromes (BPs) which are new
generalization of standard and gapped palindromes
 We introduce representatives of BPs

Largest BPs (of strings)

Maximal BPs (in strings)
 We study basic properties of these representatives and
give efficient algorithms to compute/enumerate them
 Open problems:

Is it possible to represent the maximal BPs compactly?
(what if we only consider BPs with empty center blocks?)

Can we utilize it to design faster enumeration algorithms?
Conclusions and Future Work
SPIRE 2018 in LIMA 19/ 19