Soft Cardinality Constraints on XML Data How Exceptions Prove the Business Rule Emir Muñoz Fujitsu Ireland Ltd. Joint work with F. Ferrarotti, S. Hartmann, S. Link, M. Marin @ Nanjing, China, 14th October 2013
Contribution • Introduce the definition of soft cardinality constraints over XML data. • Efficient low-degree polynomial time decision algorithm for the implication problem. • Empirical evaluation of soft cardinality constraints on real XML data. Emir M. - WISE, Nanjing, China, 14th October 2013 2
Introduction Concepts • Cardinality constraints: – Capture information about the frequency with which certain data items occur in particular context. • Soft cardinality constraints: – Constraints which need to be satisfied on average only, and thus permit violations in a controlled manner. Emir M. - WISE, Nanjing, China, 14th October 2013 4
• Some cardinality constraints: – Every scientist is a member of 2, 3, or 4 research teams. – Every technician can work in up to 4 different support teams. – A project cannot have more than one manager. – In every team, there should be two employees for each expertise level. Emir M. - WISE, Nanjing, China, 14th October 2013 6 Introduction Example (2/2)
• Some cardinality constraints: – Every scientist is a member of 2, 3, or 4 research teams. – Every technician can work in up to 4 different support teams. – A project cannot have more than one manager. – In every team, there should be two employees for each expertise level. Emir M. - WISE, Nanjing, China, 14th October 2013 7 Introduction Example (2/2) Probably will be exceptions Scientist working in 5 research teams or more Soft constraints
Soft Cardinality Constraints Definition • Expressiveness from the ability to specify soft upper bounds (soft-max) as well as soft lower bounds (soft-min) on the number of nodes. • soft-card(Q, (Q´, {Q1,…, Qk})) = (soft-min, soft-max) • With some sources of intractability Emir M. - WISE, Nanjing, China, 14th October 2013 8 Context path Target path Field paths soft-min = 1
• Every scientist is a member of 2, 3, or 4 research teams. – soft-card(ε, (_.RTeam.Sci, {id})) = (2, 4) • Every technician can work in up to 4 different support teams. – soft-card(ε, (_.STeam.Tech, {id})) = (1, 4) • A project cannot have more than one manager. – soft-card(_, (Manager, Ø)) = (1, 1) • In every team, there should be two employees for each expertise level. – soft-card(_._, (_, {Expertise.S})) = (2, 2) Emir M. - WISE, Nanjing, China, 14th October 2013 9 Soft Cardinality Constraints Examples
The Implication Problem Definition and Algorithm • Let be a finite set of (soft) constraints. • We say that finitely implies , denoted by if every finite XML T that satisfies all also satisfies Emir M. - WISE, Nanjing, China, 14th October 2013 10
Performance Evaluation Configuration • We compare the performance against XML Keys • Machine Intel Core i7 2.8GHz, with 4G RAM • Documents: – 321gone, yahoo (auction data) – dblp (bibliographic information on CS) – nasa (astronomical data) – SigmodRecord (articles from SIGMOD Record) – mondial (world geographic db) Emir M. - WISE, Nanjing, China, 14th October 2013 11
Conclusion • We introduced an expressive class of soft cardinality constraints, sufficiently flexible to boost XML applications such as data exchange and integration. • Slight extensions result in the intractability of the associated implication problem. • We give an axiomatization for this new class. • Present an empirical performance test that indicate its efficient application in real use cases. Emir M. - WISE, Nanjing, China, 14th October 2013 13