Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Smart XML Platform

Smart XML Platform

Smart XML Platform is a solution for companies that have to validate, verify and process huge amounts of data stored in XML files.

Sferanet

May 23, 2018
Tweet

More Decks by Sferanet

Other Decks in Technology

Transcript

  1. Introduction oThere is plenty of tools for handling data in

    JSON oMany companies still keep their data in XML format oIn XML you have a default way to validate against a schema
  2. Business use case oToday companies have terabytes of data oUsually

    stored in XML files oFiles have to be validated against an XSD schema oWe also need a logical validation: is the information coherent with other data?
  3. Smart XML Platform Three steps: oReading the data oPerform business

    logic oValidation oAggregation oWrite the result on HDFS
  4. We made it FAST oIt runs on Cloudera Distribution including

    Hadoop (CDH) oArchive files are partitioned on HDFS oRead and analyzed with Apache Spark
  5. A flexible solution oProvide the XML Schema Definition File (XSD)

    oSmart XML Engine will analyze data formats oSpecify additional business constraints using JSON oSpecify processing operations via another JSON file oReady to go!