Developing a Hadoop Based ETL Platform by ESTEBAN CHINER and IGNACIO SALES at Big Data Spain 2013

Developing a Hadoop Based ETL Platform ESTEBAN CHINER & IGNACIO
SALES

Esteban Chiner & Ignacio Sales BigData Spain Madrid, 7th /
8th November Developing a Hadoop Based ETL Platform

GFT Group Page 3 08/11/2013 Agenda 1 GFT at a
glance 2 Problem description 3 Design principles 4 Architecture description 5 Lessons learnt 6 Conclusions / Q&A

GFT Group Page 4 08/11/2013 GFT at a glance “Big
enough to deliver, small enough to care” Focus on Financial Services Industry  Among the top 10 European IT service providers (FinTech 100 ranking 2012) for the financial services sector with global reach  Long-standing partnerships with more than 15 top-tier institutions  About 2,000 employees with extensive industry knowledge State of the art Consulting, Managed Services and Solutions  Reliable technology services and solutions based on all technologies  Delivery teams from onsite, near-shore/offshore projects – excellent quality for the best price Commitment to Delivery | Passion for Technology  We know how to manage risk  Our experience enables us to deal with complexity • We do co-innovation with our clients • We ensure maximum transparency

GFT Group Page 5 08/11/2013 GFT at a glance Our
clients

GFT Group Page 6 08/11/2013 Problem description Introduction: An Investment
Bank from 10,000 feet Front Office Middle Office B a c k O f f Trade Capture Trade Validation Trade Enrichment Trade Settlement Calculation Engines Accounting Reconciliation s Risk Management Reporting

GFT Group Page 7 08/11/2013 Problem description Current state After
a few years, this is how the enterprise architecture looks like: Source 1 Source 3 Source 4 Source 2 Source 5 ETL1 ETL2 ETL3 ETL4 Target 1 Target 3 Target 2 Target 4 FIX / MQ CSV / SFTP XML1 / JMS CSV / SFTP XML2/WS Format 1 Format 3 Format 2 Format 4 JDBC

GFT Group Page 8 08/11/2013 Problem description Future state A
feed consolidation layer ESB Pattern Implementation for Batch and Real-Time Feeds With unlimited horizontal scalability Source 1 Source 3 Source 4 Source 2 Source 5 Feed Consolidation Layer Target 1 Target 3 Target 2 Target 4

GFT Group Page 9 08/11/2013 Why Hadoop?  Horizontal scalability
to cope with current and future volumes - Non-linear growth  Manage multiple structures of data  Develop ETL without being constrained by a rigid relational data model  Store all incoming / intermediate / outgoing data: Build data hub for future analytics

GFT Group Page 10 08/11/2013 Design principles  New feed
time-to-market should be reduced to the minimum  New mappings and transformations should be easy to develop  Horizontal scalability to cope with current and future volumes  Avoid vendor lock-in  Based on modular components: Plug & Play  Support multiple input formats & delivery mechanisms (batch / real time)

GFT Group Page 11 08/11/2013 Architecture description Key design decisions
 Use Hadoop in order to provide scalability  XML data format  XSLT for data transformations  External metadata storage: Oracle  Mappings done with an external tool which supports XSLT: Altova MapForce  Divide the orchestration in two: - Internal, using Oozie - External, using Tibco BusinessWorks

GFT Group Page 12 08/11/2013 Architecture description Approach and implementation
ETL Layer Data Ingestion Data Delivery Orchestration Metadata Module 1 (Filter) Module 2 (Enrich) Module 3 (Transform) Module n Reference Data Hub Flume Oracle Oozie Tibco BW HDFS MapReduce Sqoop Sqoop Java . . . Setup Process Log XSLT Monitoring GWT

GFT Group Page 13 08/11/2013 Lessons Learnt Big Data in
Finance  Every record counts  No unstructured data: rather very many diverse structures  Provide the right tools for each function  Development  Testing  Production Support  Learn to move at Open Source Speed

GFT Group Page 14 08/11/2013 Lessons Learnt Setting up a
team  How to ramp up development team skills  Good Java Programmers make good MapReduce Programmers  Concepts, API – Easy  MapReduce design – Not so easy  Working within a framework makes life easier  Focus on the “right” tools

GFT Group Page 15 08/11/2013 Lessons Learnt Technical  Handle
error records as part of the main workflow - Leverage Multiple Output capabilities  MRUnit wrapper to support unit testing of jobs with Multiple Outputs  Leverage Hive for testing and production support teams  Encrypt and compress data for security / optimize resource usage  Multi-tenancy is a challenge

GFT Group Page 16 08/11/2013 Conclusions  We have seen
how a well known Enterprise Integration challenge was solved with Big Data Technologies  We have examined the particular problems posed by integrating Hadoop into a large financial services organization  We have validated that Big Data technology is ready for use in this environment, and that its use is justified  This is only the beginning…

Developing a Hadoop Based ETL Platform by ESTEB...

Developing a Hadoop Based ETL Platform by ESTEBAN CHINER and IGNACIO SALES at Big Data Spain 2013

Big Data Spain

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript

Developing a Hadoop Based ETL Platform ESTEBAN CHINER & IGNACIO

Esteban Chiner & Ignacio Sales BigData Spain Madrid, 7th /

GFT Group Page 3 08/11/2013 Agenda 1 GFT at a

GFT Group Page 4 08/11/2013 GFT at a glance “Big

GFT Group Page 5 08/11/2013 GFT at a glance Our

GFT Group Page 6 08/11/2013 Problem description Introduction: An Investment

GFT Group Page 7 08/11/2013 Problem description Current state After

GFT Group Page 8 08/11/2013 Problem description Future state A

GFT Group Page 9 08/11/2013 Why Hadoop?  Horizontal scalability

GFT Group Page 10 08/11/2013 Design principles  New feed

GFT Group Page 11 08/11/2013 Architecture description Key design decisions

GFT Group Page 12 08/11/2013 Architecture description Approach and implementation

GFT Group Page 13 08/11/2013 Lessons Learnt Big Data in

GFT Group Page 14 08/11/2013 Lessons Learnt Setting up a

GFT Group Page 15 08/11/2013 Lessons Learnt Technical  Handle

GFT Group Page 16 08/11/2013 Conclusions  We have seen

GFT Group Page 17 08/11/2013 © Copyright GFT Technologies AG,