■ Extracting data from the original sources ■ Quality assuring and cleaning data ■ Conforming the labels and measures in the data to achieve consistency across the original sources ■ Delivering data in a physical format that can be used by query tools, report writers, and dashboards. Source: Ralph Kimball – The Data Warehouse ETL Toolkit sobota, 27. októbra 12
Source Systems Staging Area Operational Data Store Datamarts structured documents databases APIs Temporary Staging Area staging relational dimensional L0 L1 L2 sobota, 27. októbra 12
Data Pipes with SQLAlchemy Data Governance Analysis and Presentation Extraction, Transformation, Loading Data Sources Technologies and Utilities sobota, 27. októbra 12
for column in src_table.columns: target_table.append_column(column.copy()) target_table.create() insert = target_table.insert() for row in src_table.select().execute(): insert.execute(row) clone schema: copy data: sobota, 27. októbra 12
reader = csv.reader(file_stream) columns = reader.next() for column in columns: table.append_column(Column(column, String)) table.create() for row in reader: insert.execute(row) text file (CSV) to table: sobota, 27. októbra 12
Simple T from ETL Data Governance Analysis and Presentation Extraction, Transformation, Loading Data Sources Technologies and Utilities sobota, 27. októbra 12
OLAP with Cubes Data Governance Analysis and Presentation Extraction, Transformation, Loading Data Sources Technologies and Utilities sobota, 27. októbra 12
✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”]) 2006 2007 2008 2009 2010 Total Jan Feb Mar Apr March April May ... whole cube o cell = Cell(cube) browser.aggregate(o cell) browser.aggregate(o cell, drilldown=[9 “date”]) sobota, 27. októbra 12
Source Systems Staging Area Operational Data Store Datamarts structured documents databases APIs Temporary Staging Area staging relational dimensional L0 L1 L2 faster sobota, 27. októbra 12
Data Governance Analysis and Presentation Extraction, Transformation, Loading Data Sources Technologies and Utilities faster advanced understandable, maintainable sobota, 27. októbra 12