Slide 1

Slide 1 text

Soft Cardinality Constraints on XML Data How Exceptions Prove the Business Rule Emir Muñoz Fujitsu Ireland Ltd. Joint work with F. Ferrarotti, S. Hartmann, S. Link, M. Marin @ Nanjing, China, 14th October 2013

Slide 2

Slide 2 text

Contribution • Introduce the definition of soft cardinality constraints over XML data. • Efficient low-degree polynomial time decision algorithm for the implication problem. • Empirical evaluation of soft cardinality constraints on real XML data. Emir M. - WISE, Nanjing, China, 14th October 2013 2

Slide 3

Slide 3 text

Outline 1. Introduction 2. Soft Cardinality Constraints 3. The Implication Problem 4. Performance Evaluation 5. Conclusion Emir M. - WISE, Nanjing, China, 14th October 2013 3

Slide 4

Slide 4 text

Introduction Concepts • Cardinality constraints: – Capture information about the frequency with which certain data items occur in particular context. • Soft cardinality constraints: – Constraints which need to be satisfied on average only, and thus permit violations in a controlled manner. Emir M. - WISE, Nanjing, China, 14th October 2013 4

Slide 5

Slide 5 text

Introduction Example (1/2) Emir M. - WISE, Nanjing, China, 14th October 2013 5 Project within a research institute support research

Slide 6

Slide 6 text

• Some cardinality constraints: – Every scientist is a member of 2, 3, or 4 research teams. – Every technician can work in up to 4 different support teams. – A project cannot have more than one manager. – In every team, there should be two employees for each expertise level. Emir M. - WISE, Nanjing, China, 14th October 2013 6 Introduction Example (2/2)

Slide 7

Slide 7 text

• Some cardinality constraints: – Every scientist is a member of 2, 3, or 4 research teams. – Every technician can work in up to 4 different support teams. – A project cannot have more than one manager. – In every team, there should be two employees for each expertise level. Emir M. - WISE, Nanjing, China, 14th October 2013 7 Introduction Example (2/2) Probably will be exceptions Scientist working in 5 research teams or more Soft constraints

Slide 8

Slide 8 text

Soft Cardinality Constraints Definition • Expressiveness from the ability to specify soft upper bounds (soft-max) as well as soft lower bounds (soft-min) on the number of nodes. • soft-card(Q, (Q´, {Q1,…, Qk})) = (soft-min, soft-max) • With some sources of intractability Emir M. - WISE, Nanjing, China, 14th October 2013 8 Context path Target path Field paths soft-min = 1

Slide 9

Slide 9 text

• Every scientist is a member of 2, 3, or 4 research teams. – soft-card(ε, (_.RTeam.Sci, {id})) = (2, 4) • Every technician can work in up to 4 different support teams. – soft-card(ε, (_.STeam.Tech, {id})) = (1, 4) • A project cannot have more than one manager. – soft-card(_, (Manager, Ø)) = (1, 1) • In every team, there should be two employees for each expertise level. – soft-card(_._, (_, {Expertise.S})) = (2, 2) Emir M. - WISE, Nanjing, China, 14th October 2013 9 Soft Cardinality Constraints Examples

Slide 10

Slide 10 text

The Implication Problem Definition and Algorithm • Let be a finite set of (soft) constraints. • We say that finitely implies , denoted by if every finite XML T that satisfies all also satisfies Emir M. - WISE, Nanjing, China, 14th October 2013 10

Slide 11

Slide 11 text

Performance Evaluation Configuration • We compare the performance against XML Keys • Machine Intel Core i7 2.8GHz, with 4G RAM • Documents: – 321gone, yahoo (auction data) – dblp (bibliographic information on CS) – nasa (astronomical data) – SigmodRecord (articles from SIGMOD Record) – mondial (world geographic db) Emir M. - WISE, Nanjing, China, 14th October 2013 11

Slide 12

Slide 12 text

Performance Evaluation Results Expressivity Time Emir M. - WISE, Nanjing, China, 14th October 2013 12 In comparison with previous XML keys

Slide 13

Slide 13 text

Conclusion • We introduced an expressive class of soft cardinality constraints, sufficiently flexible to boost XML applications such as data exchange and integration. • Slight extensions result in the intractability of the associated implication problem. • We give an axiomatization for this new class. • Present an empirical performance test that indicate its efficient application in real use cases. Emir M. - WISE, Nanjing, China, 14th October 2013 13

Slide 14

Slide 14 text

Discussion • Questions & Answers – Soft Cardinality Constraints on XML Data THANKS! Emir Muñoz [email protected] Emir M. - WISE, Nanjing, China, 14th October 2013 14