Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BANKS - Keyword Searching and Browsing in Database

BANKS - Keyword Searching and Browsing in Database

Banks is an algorithm for data and schema browsing together with keyword-based search. You can get information from a relational database by typing a few keywords.

References:

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan,
“Keyword Searching and Browsing in Databases Using BANKS”,
Proc. 18th Int’l Conf. Data Eng. (ICDE ’02), pp. 431-440, Feb. 2002.

Marco Gomiero

January 25, 2017
Tweet

More Decks by Marco Gomiero

Other Decks in Programming

Transcript

  1. Banks: Implementation & Test Keyword Searching and Browsing in Database

    Luca Rossi Marco Gomiero Basi di Dati Laurea Magistrale in Ingegneria Informatica Anno Accademico 2015/2016
  2. MAIN FEATURES ◦ Data and schema browsing together with keyword-based

    search ◦ Get information by typing a few keywords
  3. ◦ Graph Based • Each tuple as a node •

    Each foreign-key-primary-key link as a directed edge • For each edge-link (u,v) there is a backward edge (v,u) • Nodes, Edges and back-edges have a weight
  4. SEARCHING FOR THE BEST ANSWERS ◦ Find for each keyword

    the set of the relevant nodes ◦ For each keyword node in the set, run the Dijkstra’s single source shortest path algorithm
  5. ◦ Each instance of the algorithm traverses the graph edges

    in reverse direction • Find a common vertex from which a forward path exist to a least one node in each set
  6. RESULTS OF BACKWARD EXPANDING SEARCH ◦ Each iteration of the

    algorithm creates a weighted tree • Each tree is stored in a heap ordered on the relevance of the trees ◦ After the computation the most relevant trees are returned
  7. MEMORY USAGE ◦ Graph Creation: connecting all the nodes is

    a waste of time and space • We select only the keyword-nodes and we create a starting partial graph • We navigate the database starting from this nodes with a maximum step of 3 • This choice may compromise some queries but most of them are successfully completed
  8. TABLE SELECTION FROM KEYWORD ◦ Results Computation • If a

    keyword is a table name it’s useless to keep trees whose root doesn’t belong to the table ▪ With this we can save memory and improve the precision
  9. RESULTS HANDLING ◦ The algorithm returns the 10 most relevant

    results; however the score of some trees could be the same. • In this case we return all the trees with the same score
  10. TEST MACHINE ◦ OS: Arch Linux ◦ Processor: Intel Core

    i7 4700HQ @ 2.40 Ghz ◦ Memory: 8 GB DDR3 1600 MHz ◦ Storage: SSD SANDISK DDR3 SDRAM ◦ DBMS: PostgreSQL ◦ Java version: openjdk version "1.8.0_112"
  11. TEST ENVIRONMENT ◦ Max execution time: 1 hour ◦ JVM

    Max Memory: 8 GB ◦ SWAP: 16 GB ◦ Databases: Mondial & IMDB
  12. QUERIES BEHAVIOUR Completed Correct Wrong OutOfTime OutOfMemory Mondial 47 29

    21 0 3 IMDB 30 24 26 19 1 Completed Correct Wrong OutOfTime OutOfMemory Mondial 29 N.A. N.A. 21 0 IMDB 7 N.A. N.A. 41 2 ◦ Our Implementation: ◦ Original Implementation:
  13. COMPARISON ◦ Our Implementation: • Mondial: 241.4seconds • IMDB: 1681.62

    seconds ◦ Original Implementation: • Mondial: 1910.9 seconds • IMDB: 3239.7 seconds
  14. ◦ From the original test we can deduce that the

    algorithm performs badly with big data ◦ We tried to minimize this data by selecting only the necessary tuples by using a partial graph ◦ So we have obtained: • Less Memory occupation • Less Execution time • More Correct results
  15. 1. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and

    S. Sudarshan, “Keyword Searching and Browsing in Databases Using BANKS”, Proc. 18th Int’l Conf. Data Eng. (ICDE ’02), pp. 431-440, Feb. 2002. 2. J. Coffman and A. C. Weaver “A framework for evaluating database keyword search strategies”, CIKM ’10: Proceedings of the 19th ACM Conference on Information and Knowledge Management., pages 729–738, Oct. 2010.
  16. THANKS! Luca Rossi Marco Gomiero We wish to thank Matteo

    Favaro for implementing some parts of the algorithm