BANKS - Keyword Searching and Browsing in Database

Banks: Implementation & Test Keyword Searching and Browsing in Database
Luca Rossi Marco Gomiero Basi di Dati Laurea Magistrale in Ingegneria Informatica Anno Accademico 2015/2016

1. Main features

MAIN FEATURES ◦ Data and schema browsing together with keyword-based
search ◦ Get information by typing a few keywords

◦ Graph Based • Each tuple as a node •
Each foreign-key-primary-key link as a directed edge • For each edge-link (u,v) there is a backward edge (v,u) • Nodes, Edges and back-edges have a weight

SEARCHING FOR THE BEST ANSWERS ◦ Find for each keyword
the set of the relevant nodes ◦ For each keyword node in the set, run the Dijkstra’s single source shortest path algorithm

◦ Each instance of the algorithm traverses the graph edges
in reverse direction • Find a common vertex from which a forward path exist to a least one node in each set

RESULTS OF BACKWARD EXPANDING SEARCH ◦ Each iteration of the
algorithm creates a weighted tree • Each tree is stored in a heap ordered on the relevance of the trees ◦ After the computation the most relevant trees are returned

2. Problems & Implementation

MEMORY USAGE ◦ Graph Creation: connecting all the nodes is
a waste of time and space • We select only the keyword-nodes and we create a starting partial graph • We navigate the database starting from this nodes with a maximum step of 3 • This choice may compromise some queries but most of them are successfully completed

TABLE SELECTION FROM KEYWORD ◦ Results Computation • If a
keyword is a table name it’s useless to keep trees whose root doesn’t belong to the table ▪ With this we can save memory and improve the precision

RESULTS HANDLING ◦ The algorithm returns the 10 most relevant
results; however the score of some trees could be the same. • In this case we return all the trees with the same score

3. Test

TEST MACHINE ◦ OS: Arch Linux ◦ Processor: Intel Core
i7 4700HQ @ 2.40 Ghz ◦ Memory: 8 GB DDR3 1600 MHz ◦ Storage: SSD SANDISK DDR3 SDRAM ◦ DBMS: PostgreSQL ◦ Java version: openjdk version "1.8.0_112"

TEST ENVIRONMENT ◦ Max execution time: 1 hour ◦ JVM
Max Memory: 8 GB ◦ SWAP: 16 GB ◦ Databases: Mondial & IMDB

TEST PARAMETERS ◦ Scale: Logarithm ◦ Combination: Additive ◦ Lambda:
0.2

USER INTERFACE

OUTPUT EXAMPLE

QUERIES BEHAVIOUR Completed Correct Wrong OutOfTime OutOfMemory Mondial 47 29
21 0 3 IMDB 30 24 26 19 1 Completed Correct Wrong OutOfTime OutOfMemory Mondial 29 N.A. N.A. 21 0 IMDB 7 N.A. N.A. 41 2 ◦ Our Implementation: ◦ Original Implementation:

EXECUTION TIME WITH MONDIAL DATASET • With OutOfTime: 241.4 seconds
• Without OutOfTime: 27.02 seconds

EXECUTION TIME WITH IMDB DATASET • With OutOfTime: 1681.62 seconds
• Without OutOfTime: 522.7 seconds

COMPARISON ◦ Our Implementation: • Mondial: 241.4seconds • IMDB: 1681.62
seconds ◦ Original Implementation: • Mondial: 1910.9 seconds • IMDB: 3239.7 seconds

PRECISION @1 PRECISION @10 PRECISION

MAP & RECALL MAP RECALL

EXECUTION TIME VS QUERY LENGTH MONDIAL IMDB

3. Conclusions

◦ From the original test we can deduce that the
algorithm performs badly with big data ◦ We tried to minimize this data by selecting only the necessary tuples by using a partial graph ◦ So we have obtained: • Less Memory occupation • Less Execution time • More Correct results

4. References

1. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and
S. Sudarshan, “Keyword Searching and Browsing in Databases Using BANKS”, Proc. 18th Int’l Conf. Data Eng. (ICDE ’02), pp. 431-440, Feb. 2002. 2. J. Coffman and A. C. Weaver “A framework for evaluating database keyword search strategies”, CIKM ’10: Proceedings of the 19th ACM Conference on Information and Knowledge Management., pages 729–738, Oct. 2010.

THANKS! Luca Rossi Marco Gomiero We wish to thank Matteo
Favaro for implementing some parts of the algorithm

BANKS - Keyword Searching and Browsing in Database

BANKS - Keyword Searching and Browsing in Database

Marco Gomiero

More Decks by Marco Gomiero

Other Decks in Programming

Featured

Transcript

Banks: Implementation & Test Keyword Searching and Browsing in Database

1. Main features

MAIN FEATURES ◦ Data and schema browsing together with keyword-based

◦ Graph Based • Each tuple as a node •

SEARCHING FOR THE BEST ANSWERS ◦ Find for each keyword

◦ Each instance of the algorithm traverses the graph edges

RESULTS OF BACKWARD EXPANDING SEARCH ◦ Each iteration of the

2. Problems & Implementation

MEMORY USAGE ◦ Graph Creation: connecting all the nodes is

TABLE SELECTION FROM KEYWORD ◦ Results Computation • If a

RESULTS HANDLING ◦ The algorithm returns the 10 most relevant

3. Test

TEST MACHINE ◦ OS: Arch Linux ◦ Processor: Intel Core

TEST ENVIRONMENT ◦ Max execution time: 1 hour ◦ JVM

TEST PARAMETERS ◦ Scale: Logarithm ◦ Combination: Additive ◦ Lambda:

USER INTERFACE

OUTPUT EXAMPLE

QUERIES BEHAVIOUR Completed Correct Wrong OutOfTime OutOfMemory Mondial 47 29

EXECUTION TIME WITH MONDIAL DATASET • With OutOfTime: 241.4 seconds

EXECUTION TIME WITH IMDB DATASET • With OutOfTime: 1681.62 seconds

COMPARISON ◦ Our Implementation: • Mondial: 241.4seconds • IMDB: 1681.62

PRECISION @1 PRECISION @10 PRECISION

MAP & RECALL MAP RECALL

EXECUTION TIME VS QUERY LENGTH MONDIAL IMDB

3. Conclusions

◦ From the original test we can deduce that the

4. References

1. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and

THANKS! Luca Rossi Marco Gomiero We wish to thank Matteo