FishMark: A Linked Data Application Benchmark @ SSWS 2012, Boston, US

Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark Van
Harmelen, Rafael S. Goncalves, and Cristina Garilao FishMark: A Linked Data Application Benchmark SSWS+HPCSW, 9th November 2012

Samantha Bail FishMark: A Linked Data Application Benchmark Application benchmarks:
Desiderata •Use real(istic...) data, queries, and query mixes •(Realistic) scalability of the data ‣ Scale down data for system to handle ‣ Test how system scales •Compare alternative technologies ‣ Linked data vs classic relational DB •Transparency (what is measured and how?) 2

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark •FishMark
is an application benchmark for •SQL application vs •equivalent SPARQL application ‣ ETL ‣ database-to-RDF mapping ‣ OBDA ‣ ... 3

Samantha Bail FishMark: A Linked Data Application Benchmark Background: FishBase
& FishDelish •FishBase: Database about the world’s finned fish species •Contains information about ~32,000 species •fishbase.org: Web front-end to the FishBase DB •Provides interface for (canned) queries •Backed by MySQL DB •DB: 195 tables (3GB) •FishDelish: RDF graph of fishbase.org •Result of D2R conversion •1.38bn triples (250GB) 4

ﬁshbase.org home page

common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code
FROM comnames, species, countref WHERE comnames.SpecCode=species.SpecCode AND comnames.C_Code=countref.C_Code AND comnames.ComName="Zebraﬁsh"

species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,
families.Order, families.Class, morphdat.AddChars, species.DemersPelag, species.SpeciesRefNo, species.AnaCat, species.PicPreferredName, picturesmain.autoctr, picturesmain.PicName, picturesmain.Entered, picturesmain.AuthName FROM species, refrens, morphdat, families, picturesmain WHERE species.SpeciesRefNo=refrens.RefNo AND species.SpecCode=morphdat.SpecCode AND species.FamCode=families.FamCode AND species.SpecCode =picturesmain.SpecCode AND (species.Genus="Danio" AND species.Species="rerio")

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Architecture
9 Multi-platform benchmarking framework FishBase / FishDelish data FishMark Query templates load data execute and measure queries

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Data
•Several sets of FishBase / FishDelish data •Full ﬁshbase.org DB dump (195 tables / 1.38bn triples) •Reduced version (10 tables / 20 million triples) ‣ Only information needed for queries •3 self-contained scaled sets: 10,000 / 20,000 / 30,000 species •OBDA components •OWL ontology (10 classes, 10 properties, 206 axioms) •OBDA model (20 mappings) 10

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Queries
•22 query templates based on typical activities on ﬁshbase.org: ‣ Generate search results page for common name search ‣ Generate species page for a given species and genus ‣ Generate pictures page for a given species ‣ ... •SQL queries ported to SPARQL 11

Samantha Bail FishMark: A Linked Data Application Benchmark Query generation
•Using ﬁxed queries may introduce bias, e.g. •common name search for ‘salmon’: 96 results •common name search for ‘borna snakehead’: 1 result ‣ We want to measure performance of the same query type (‘search for common name’, ‘generate species page’, ...) with different parameter values ‣ Use query templates to generate queries with random parameters 12

FROM comnames, species, countref WHERE comnames.SpecCode=species.SpecCode AND comnames.C_Code=countref.C_Code AND comnames.ComName="Zebraﬁsh"

FROM comnames, species, countref WHERE comnames.SpecCode=species.SpecCode AND comnames.C_Code=countref.C_Code AND comnames.ComName="%ComName%"

families.Order, families.Class, morphdat.AddChars, species.DemersPelag, species.SpeciesRefNo, species.AnaCat, species.PicPreferredName, picturesmain.autoctr, picturesmain.PicName, picturesmain.Entered, picturesmain.AuthName FROM species, refrens, morphdat, families, picturesmain WHERE species.SpeciesRefNo=refrens.RefNo AND species.SpecCode=morphdat.SpecCode AND species.FamCode=families.FamCode AND species.SpecCode =picturesmain.SpecCode AND (species.Genus="Danio" AND species.Species="rerio")

families.Order, families.Class, morphdat.AddChars, species.DemersPelag, species.SpeciesRefNo, species.AnaCat, species.PicPreferredName, picturesmain.autoctr, picturesmain.PicName, picturesmain.Entered, picturesmain.AuthName FROM species, refrens, morphdat, families, picturesmain WHERE species.SpeciesRefNo=refrens.RefNo AND species.SpecCode=morphdat.SpecCode AND species.FamCode=families.FamCode AND species.SpecCode =picturesmain.SpecCode AND (species.Genus="%Genus%" AND species.Species="%Species%")

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Sample
query template 17 <query id="commonname"> <paramquery> <![CDATA[ PREFIX fd: <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/> SELECT ?type ?species ?genus ?country ?language WHERE { ?nameID fd:comnames_ComName "%comname%" . ?nameID fd:comnames_NameType ?type . ?nameID fd:comnames_SpecCode ?code . ?nameID fd:comnames_C_Code ?ccode . ?code fd:species_Species ?species . ?code fd:species_Genus ?genus . ?ccode fd:countref_PAESE ?country . } ]]> </paramquery> <parameter> <paramname>comname</paramname> <paramvaluesquery> <![CDATA[ PREFIX fd: <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/> SELECT ?comname WHERE { ?nameID fd:comnames_ComName ?comname . ?nameID fd:comnames_Language "English" . } ]]> </paramvaluesquery> </parameter> </query> seed query

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Sample
query template 18 <query id="commonname"> <paramquery> <![CDATA[ PREFIX fd: <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/> SELECT ?type ?species ?genus ?country ?language WHERE { ?nameID fd:comnames_ComName "%comname%" . ?nameID fd:comnames_NameType ?type . ?nameID fd:comnames_SpecCode ?code . ?nameID fd:comnames_C_Code ?ccode . ?code fd:species_Species ?species . ?code fd:species_Genus ?genus . ?ccode fd:countref_PAESE ?country . } ]]> </paramquery> <parameter> <paramname>comname</paramname> <paramvaluesquery> <![CDATA[ PREFIX fd: <http://fishdelish.cs.man.ac.uk/rdf/vocab/resource/> SELECT ?comname WHERE { ?nameID fd:comnames_ComName ?comname . ?nameID fd:comnames_Language "English" . } ]]> </paramvaluesquery> </parameter> </query> select random value from result set seed query

Samantha Bail FishMark: A Linked Data Application Benchmark Multi-platform benchmarking
framework •Based on Berlin SPARQL Benchmark (BSBM) [1] framework •Extensions: •Query generation from query templates •Connection for OBDA systems •Supports different query scenarios 19 [1] Christian Bizer, Andreas Schultz: The Berlin SPARQL Benchmark . In: International Journal on Semantic Web & Information Systems, Vol. 5, Issue 2, Pages 1-24, 2009.

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Benchmarking
scenarios 1. Individual query performance 2. Weighted query mix 3. Typical use case query mix 20

Samantha Bail FishMark: A Linked Data Application Benchmark 1. Individual
query performance •Each query is measured independently •Touches every query in the set •Useful for isolating problematic queries 21

Samantha Bail FishMark: A Linked Data Application Benchmark 2. Weighted
query mix •Obtain frequency of query from ﬁshbase.org server logs •Execute queries to simulate ‘realistic’ load of server •Problem: Does not touch some of the rare queries 22 Query Total / Month Avg. / Day Avg. / Hour SpeciesPage CommonName By Genus CountrySpeciesInformation CollaboratorPage 96154 3205.13 133.55 31008 1033.60 43.07 13331 444.63 18.53 4429 147.63 6.15 4138 137.93 5.75

Samantha Bail FishMark: A Linked Data Application Benchmark 3. Typical
use case query mix •Query mix based on typical ‘session’ of ﬁshbase.org visitor •Cf. BSBM ‘Explore’ use case •Requires chaining of query results 23 Search for common name ‘Zebraﬁsh’ Generate species page for ‘Danio rerio’ Request picture page for ‘Danio rerio’ ...

Samantha Bail FishMark: A Linked Data Application Benchmark Preliminary evaluation
•Tested systems: •Virtuoso Open Source 6.1.5 triple store •ontop Quest 1.7 OBDA System (using a MySQL database) •MySQL 5.5 Relational DBMS •Benchmark parameters for scenario 1 •50 warm-up runs •100 timed runs per query instance •20* instantiations of each query template ‣ 2000 timed runs per query type 24

Samantha Bail FishMark: A Linked Data Application Benchmark Preliminary evaluation
•Tested systems: •Virtuoso Open Source 6.1.5 triple store •ontop Quest 1.7 OBDA System (using a MySQL database) •MySQL 5.5 Relational DBMS •Benchmark parameters for scenario 1 •50 warm-up runs •100 timed runs per query instance •20* instantiations of each query template ‣ 2000 timed runs per query type 25 * arbitrary value! Better: # instantiations based on the number of results returned by seed query

Samantha Bail FishMark: A Linked Data Application Benchmark Results: Avg
query execution time (no cache) 26

Samantha Bail FishMark: A Linked Data Application Benchmark Results: Avg
query execution time (with cache) 27

Samantha Bail FishMark: A Linked Data Application Benchmark Results: Queries
per second (with cache) 28 Factor = comparison against MySQL performance Query name Virtuoso Virtuoso factor Quest Quest factor MySQL CSpeciesInformation CAquariumTrade CUsedForAquaculture CommonName PicturePage Species FamilyNominalSpecies Genus FamilyAllﬁsh CEndemic CPotentialAquaculture CollaboratorPage FamilyListOfPictures CGameFish FamilyInformation SpeciesPage CCommercial CIntroduced CAllFish CPelagic CReefAssociated CFreshwater 15 1.10% 866 65.40% 1324 84 6.40% 910 69.80% 1303 14 1.10% 850 66.70% 1274 128 10.20% 850 67.50% 1258 149 12.10% 893 72.10% 1238 197 16.30% 951 78.40% 1212 173 15.40% 849 75.40% 1126 157 14.50% 818 75.20% 1087 155 14.70% 796 75.90% 1049 19 1.90% 733 70.70% 1037 45 4.50% 714 70.40% 1014 26 2.60% 657 65.40% 1006 105 10.60% 728 73.30% 993 11 1.10% 639 65.20% 979 17 1.90% 593 65.60% 903 1 0.10% 578 71.20% 811 14 1.80% 541 70.20% 771 14 2.10% 442 67.90% 651 28 5.10% 413 74.70% 553 9 1.90% 349 74.10% 471 5 1.40% 273 70.40% 388 7 2.30% 217 69.90% 310

Samantha Bail FishMark: A Linked Data Application Benchmark Summary and
future work •FishMark: Application benchmark based on real data & queries •Preliminary evaluation: •Virtuoso: 5% of MySQL performance •Quest (with cache): 70% of MySQL performance •Future work: •Extend framework for query scenarios 2 and 3 •Perform comprehensive tests ‣ More systems ‣ Complete FishDelish data (1.38bn triples) 29

Contact: [email protected] Any questions? Thank you!

FishMark: A Linked Data Application Benchmark @...

FishMark: A Linked Data Application Benchmark @ SSWS 2012, Boston, US

spbail

More Decks by spbail

Other Decks in Research

Featured

Transcript

Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark Van

Samantha Bail FishMark: A Linked Data Application Benchmark Application benchmarks:

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark •FishMark

Samantha Bail FishMark: A Linked Data Application Benchmark Background: FishBase

ﬁshbase.org home page

common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code

species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Architecture

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Data

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Queries

Samantha Bail FishMark: A Linked Data Application Benchmark Query generation

common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code

common name search results page SELECT species.Species, comnames.NameType, species.Genus, countref.C_Code

species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,

species page SELECT species.SpecCode, species.Author, species.FBname, refrens.Author, refrens.Year, species.Comments, families.Family,

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Sample

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Sample

Samantha Bail FishMark: A Linked Data Application Benchmark Multi-platform benchmarking

Samantha Bail FishMark: A Linked Data Application Benchmark FishMark: Benchmarking

Samantha Bail FishMark: A Linked Data Application Benchmark 1. Individual

Samantha Bail FishMark: A Linked Data Application Benchmark 2. Weighted

Samantha Bail FishMark: A Linked Data Application Benchmark 3. Typical

Samantha Bail FishMark: A Linked Data Application Benchmark Preliminary evaluation

Samantha Bail FishMark: A Linked Data Application Benchmark Preliminary evaluation

Samantha Bail FishMark: A Linked Data Application Benchmark Results: Avg

Samantha Bail FishMark: A Linked Data Application Benchmark Results: Avg

Samantha Bail FishMark: A Linked Data Application Benchmark Results: Queries

Samantha Bail FishMark: A Linked Data Application Benchmark Summary and

Contact: [email protected] Any questions? Thank you!