FishMark: A Linked Data Application Benchmark @ SSWS 2012, Boston, US

46ee69b10ad91bdf47c4df1effeebb55?s=47 spbail
November 11, 2012

FishMark: A Linked Data Application Benchmark @ SSWS 2012, Boston, US

Slides for my presentation of our paper on FishMark, a linked data application benchmark, which can be used to measure the performance of linked data stores vs classic relational DB.

46ee69b10ad91bdf47c4df1effeebb55?s=128

spbail

November 11, 2012
Tweet

Transcript

  1. Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark Van

    Harmelen, Rafael S. Goncalves, and Cristina Garilao FishMark: A Linked Data Application Benchmark SSWS+HPCSW, 9th November 2012
  2. Samantha Bail FishMark: A Linked Data Application Benchmark Application benchmarks:

    Desiderata •Use real(istic...) data, queries, and query mixes •(Realistic) scalability of the data ‣ Scale down data for system to handle ‣ Test how system scales •Compare alternative technologies ‣ Linked data vs classic relational DB •Transparency (what is measured and how?) 2
  3. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark •FishMark

    is an application benchmark for •SQL application vs •equivalent SPARQL application ‣ ETL ‣ database-to-RDF mapping ‣ OBDA ‣ ... 3
  4. Samantha Bail FishMark: A Linked Data Application Benchmark Background: FishBase

    & FishDelish •FishBase: Database about the world’s finned fish species •Contains information about ~32,000 species •fishbase.org: Web front-end to the FishBase DB •Provides interface for (canned) queries •Backed by MySQL DB •DB: 195 tables (3GB) •FishDelish: RDF graph of fishbase.org •Result of D2R conversion •1.38bn triples (250GB) 4
  5. fishbase.org home page

  6. None
  7. common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code

    FROM comnames, species, countref WHERE comnames.SpecCode=species.SpecCode AND comnames.C_Code=countref.C_Code AND comnames.ComName="Zebrafish"
  8. species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,

    families.Order, families.Class, morphdat.AddChars, species.DemersPelag, species.SpeciesRefNo, species.AnaCat, species.PicPreferredName, picturesmain.autoctr, picturesmain.PicName, picturesmain.Entered, picturesmain.AuthName FROM species, refrens, morphdat, families, picturesmain WHERE species.SpeciesRefNo=refrens.RefNo AND species.SpecCode=morphdat.SpecCode AND species.FamCode=families.FamCode AND species.SpecCode =picturesmain.SpecCode AND (species.Genus="Danio" AND species.Species="rerio")
  9. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Architecture

    9 Multi-platform benchmarking framework FishBase / FishDelish data FishMark Query templates load data execute and measure queries
  10. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Data

    •Several sets of FishBase / FishDelish data •Full fishbase.org DB dump (195 tables / 1.38bn triples) •Reduced version (10 tables / 20 million triples) ‣ Only information needed for queries •3 self-contained scaled sets: 10,000 / 20,000 / 30,000 species •OBDA components •OWL ontology (10 classes, 10 properties, 206 axioms) •OBDA model (20 mappings) 10
  11. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Queries

    •22 query templates based on typical activities on fishbase.org: ‣ Generate search results page for common name search ‣ Generate species page for a given species and genus ‣ Generate pictures page for a given species ‣ ... •SQL queries ported to SPARQL 11
  12. Samantha Bail FishMark: A Linked Data Application Benchmark Query generation

    •Using fixed queries may introduce bias, e.g. •common name search for ‘salmon’: 96 results •common name search for ‘borna snakehead’: 1 result ‣ We want to measure performance of the same query type (‘search for common name’, ‘generate species page’, ...) with different parameter values ‣ Use query templates to generate queries with random parameters 12
  13. common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code

    FROM comnames, species, countref WHERE comnames.SpecCode=species.SpecCode AND comnames.C_Code=countref.C_Code AND comnames.ComName="Zebrafish"
  14. common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code

    FROM comnames, species, countref WHERE comnames.SpecCode=species.SpecCode AND comnames.C_Code=countref.C_Code AND comnames.ComName="%ComName%"
  15. species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,

    families.Order, families.Class, morphdat.AddChars, species.DemersPelag, species.SpeciesRefNo, species.AnaCat, species.PicPreferredName, picturesmain.autoctr, picturesmain.PicName, picturesmain.Entered, picturesmain.AuthName FROM species, refrens, morphdat, families, picturesmain WHERE species.SpeciesRefNo=refrens.RefNo AND species.SpecCode=morphdat.SpecCode AND species.FamCode=families.FamCode AND species.SpecCode =picturesmain.SpecCode AND (species.Genus="Danio" AND species.Species="rerio")
  16. species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,

    families.Order, families.Class, morphdat.AddChars, species.DemersPelag, species.SpeciesRefNo, species.AnaCat, species.PicPreferredName, picturesmain.autoctr, picturesmain.PicName, picturesmain.Entered, picturesmain.AuthName FROM species, refrens, morphdat, families, picturesmain WHERE species.SpeciesRefNo=refrens.RefNo AND species.SpecCode=morphdat.SpecCode AND species.FamCode=families.FamCode AND species.SpecCode =picturesmain.SpecCode AND (species.Genus="%Genus%" AND species.Species="%Species%")
  17. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Sample

    query template 17        <query  id="commonname">                <paramquery>                          <![CDATA[                              PREFIX  fd:  <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/>                              SELECT  ?type  ?species  ?genus  ?country  ?language                              WHERE  {                              ?nameID  fd:comnames_ComName  "%comname%"  .                              ?nameID  fd:comnames_NameType  ?type  .                              ?nameID  fd:comnames_SpecCode  ?code  .                              ?nameID  fd:comnames_C_Code  ?ccode  .                              ?code  fd:species_Species  ?species  .                              ?code  fd:species_Genus  ?genus  .                              ?ccode  fd:countref_PAESE  ?country  .                }                      ]]>                </paramquery>                <parameter>                        <paramname>comname</paramname>                        <paramvaluesquery>                                        <![CDATA[                                                PREFIX  fd:  <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/>                                                SELECT  ?comname                                                WHERE  {                                                ?nameID  fd:comnames_ComName  ?comname  .                                                ?nameID  fd:comnames_Language  "English"  .                                                }                                        ]]>                        </paramvaluesquery>                </parameter>        </query> seed query
  18. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Sample

    query template 18        <query  id="commonname">                <paramquery>                          <![CDATA[                              PREFIX  fd:  <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/>                              SELECT  ?type  ?species  ?genus  ?country  ?language                              WHERE  {                              ?nameID  fd:comnames_ComName  "%comname%"  .                              ?nameID  fd:comnames_NameType  ?type  .                              ?nameID  fd:comnames_SpecCode  ?code  .                              ?nameID  fd:comnames_C_Code  ?ccode  .                              ?code  fd:species_Species  ?species  .                              ?code  fd:species_Genus  ?genus  .                              ?ccode  fd:countref_PAESE  ?country  .                }                      ]]>                </paramquery>                <parameter>                        <paramname>comname</paramname>                        <paramvaluesquery>                                        <![CDATA[                                                PREFIX  fd:  <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/>                                                SELECT  ?comname                                                WHERE  {                                                ?nameID  fd:comnames_ComName  ?comname  .                                                ?nameID  fd:comnames_Language  "English"  .                                                }                                        ]]>                        </paramvaluesquery>                </parameter>        </query> select random value from result set seed query
  19. Samantha Bail FishMark: A Linked Data Application Benchmark Multi-platform benchmarking

    framework •Based on Berlin SPARQL Benchmark (BSBM) [1] framework •Extensions: •Query generation from query templates •Connection for OBDA systems •Supports different query scenarios 19 [1] Christian Bizer, Andreas Schultz: The Berlin SPARQL Benchmark . In: International Journal on Semantic Web & Information Systems, Vol. 5, Issue 2, Pages 1-24, 2009.
  20. Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Benchmarking

    scenarios 1. Individual query performance 2. Weighted query mix 3. Typical use case query mix 20
  21. Samantha Bail FishMark: A Linked Data Application Benchmark 1. Individual

    query performance •Each query is measured independently •Touches every query in the set •Useful for isolating problematic queries 21
  22. Samantha Bail FishMark: A Linked Data Application Benchmark 2. Weighted

    query mix •Obtain frequency of query from fishbase.org server logs •Execute queries to simulate ‘realistic’ load of server •Problem: Does not touch some of the rare queries 22 Query Total / Month Avg. / Day Avg. / Hour SpeciesPage CommonName By Genus CountrySpeciesInformation CollaboratorPage 96154 3205.13 133.55 31008 1033.60 43.07 13331 444.63 18.53 4429 147.63 6.15 4138 137.93 5.75
  23. Samantha Bail FishMark: A Linked Data Application Benchmark 3. Typical

    use case query mix •Query mix based on typical ‘session’ of fishbase.org visitor •Cf. BSBM ‘Explore’ use case •Requires chaining of query results 23 Search for common name ‘Zebrafish’ Generate species page for ‘Danio rerio’ Request picture page for ‘Danio rerio’ ...
  24. Samantha Bail FishMark: A Linked Data Application Benchmark Preliminary evaluation

    •Tested systems: •Virtuoso Open Source 6.1.5 triple store •ontop Quest 1.7 OBDA System (using a MySQL database) •MySQL 5.5 Relational DBMS •Benchmark parameters for scenario 1 •50 warm-up runs •100 timed runs per query instance •20* instantiations of each query template ‣ 2000 timed runs per query type 24
  25. Samantha Bail FishMark: A Linked Data Application Benchmark Preliminary evaluation

    •Tested systems: •Virtuoso Open Source 6.1.5 triple store •ontop Quest 1.7 OBDA System (using a MySQL database) •MySQL 5.5 Relational DBMS •Benchmark parameters for scenario 1 •50 warm-up runs •100 timed runs per query instance •20* instantiations of each query template ‣ 2000 timed runs per query type 25 * arbitrary value! Better: # instantiations based on the number of results returned by seed query
  26. Samantha Bail FishMark: A Linked Data Application Benchmark Results: Avg

    query execution time (no cache) 26
  27. Samantha Bail FishMark: A Linked Data Application Benchmark Results: Avg

    query execution time (with cache) 27
  28. Samantha Bail FishMark: A Linked Data Application Benchmark Results: Queries

    per second (with cache) 28 Factor = comparison against MySQL performance Query name Virtuoso Virtuoso factor Quest Quest factor MySQL CSpeciesInformation CAquariumTrade CUsedForAquaculture CommonName PicturePage Species FamilyNominalSpecies Genus FamilyAllfish CEndemic CPotentialAquaculture CollaboratorPage FamilyListOfPictures CGameFish FamilyInformation SpeciesPage CCommercial CIntroduced CAllFish CPelagic CReefAssociated CFreshwater 15 1.10% 866 65.40% 1324 84 6.40% 910 69.80% 1303 14 1.10% 850 66.70% 1274 128 10.20% 850 67.50% 1258 149 12.10% 893 72.10% 1238 197 16.30% 951 78.40% 1212 173 15.40% 849 75.40% 1126 157 14.50% 818 75.20% 1087 155 14.70% 796 75.90% 1049 19 1.90% 733 70.70% 1037 45 4.50% 714 70.40% 1014 26 2.60% 657 65.40% 1006 105 10.60% 728 73.30% 993 11 1.10% 639 65.20% 979 17 1.90% 593 65.60% 903 1 0.10% 578 71.20% 811 14 1.80% 541 70.20% 771 14 2.10% 442 67.90% 651 28 5.10% 413 74.70% 553 9 1.90% 349 74.10% 471 5 1.40% 273 70.40% 388 7 2.30% 217 69.90% 310
  29. Samantha Bail FishMark: A Linked Data Application Benchmark Summary and

    future work •FishMark: Application benchmark based on real data & queries •Preliminary evaluation: •Virtuoso: 5% of MySQL performance •Quest (with cache): 70% of MySQL performance •Future work: •Extend framework for query scenarios 2 and 3 •Perform comprehensive tests ‣ More systems ‣ Complete FishDelish data (1.38bn triples) 29
  30. Contact: bails@cs.man.ac.uk Any questions? Thank you!