An Introduction to Hadoop within the Teradata UDA

© Hortonworks Inc. 2013 Chris Harris Twitter : cj_harris5 E-mail
: [email protected] Page 1 An Introduction to Hadoop within the Teradata UDA

© Hortonworks Inc. 2013 Web giants proved the ROI in
data products applying data science to large amounts of data Page 3 Amazon: 35% of product sales come from product recommendations Netflix: 75% of streaming video results from recommendations Prediction of click through rates

© Hortonworks Inc. 2013 Key use-cases in Finance/Insurance • Customer risk
profiling: – How likely is this customer to pay back his mortgage? – How likely is this customer to get sick? • Fraud detection: – Detect illegal credit card activity and alert bank/consumer – Detect illegal insurance claims • Internal fraud detection (compliance): – Is this employee accessing financial information they are not allowed to access? Page 4

© Hortonworks Inc. 2013 Key use-cases in Telco/Mobile • Customer life-time-value
prediction – What is the LTV for customer X? • Marketing – Which new mobile phone should we offer to customer X so that they remain with us? – Location based advertising • Failure prediction – When will equipment X in cell tower Y fail? • Cell Tower Management – Predict load and bandwidth on cell towers to optimize network Page 5

© Hortonworks Inc. 2013 Key use-cases in Healthcare • Clinical Decision
Support: – What is the ideal treatment for this patient? • Cost management: – What is the expected overall cost of treatment for this patient over the life of the disease • Diagnostics: – Given these test results, what is the likelihood of cancer? • Epidemic management – Predict size and location of epidemic spread Page 6

© Hortonworks Inc. 2013 Data science is a natural next
step after business intelligence Page 7 Value Refine Extract Enrich Data Science Dashboards Reports Score-cards Affinity Analysis Outlier Detection Clustering Recommendation Regression Classification Business Intelligence: measure & count; simple analytics Data Science: discovery & prediction; complex analytics; “data product” Discovery Prediction

AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS
CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE UNIFIED DATA ARCHITECTURE Big Data Analytics Big Data Management LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line Workers Customers / Partners Quants Operational Systems Executives

AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS
CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS VIEWPOINT SUPPORT Engineers Data Scientists Business Analysts Front-Line Workers Customers / Partners Quants Operational Systems Executives TERADATA UNIFIED DATA ARCHITECTURE Aster Connector for Hadoop Teradata Connector for Hadoop Aster Teradata Connector SQL-H Aster Loader Teradata Loader SQL-H

© Hortonworks Inc. 2013 A Brief History of Apache Hadoop
Page 11 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Focus on OPERATIONS 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Yahoo! begins to Operate at scale Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2008 2010 2012 2006 STABILITY 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo

© Hortonworks Inc. 2013 Leadership that Starts at the Core
Page 12 •  Driving next generation Hadoop –  YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery •  420k+ lines authored since 2006 –  More than twice nearest contributor •  Deeply integrating w/ecosystem –  Enabling new deployment platforms –  (ex. Windows & Azure, Linux & VMware HA) –  Creating deeply engineered solutions –  (ex. Teradata big data appliance) •  All Apache, NO holdbacks –  100% of code contributed to Apache

© Hortonworks Inc. 2013 Operational Data Refinery Page 13 DATA
SYSTEMS DATA SOURCES 1 3 1 Capture Capture all data Process Parse, cleanse, apply structure & transform Exchange Push to existing data warehouse for use with existing analytic tools 2 3 Refine Explore Enrich 2 APPLICATIONS Collect data and apply a known algorithm to it in trusted operational process TRADITIONAL REPOS RDBMS EDW MPP Business Analy;cs Custom Applica;ons Enterprise Applica;ons Tradi;onal Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)

© Hortonworks Inc. 2013 Key Capability in Hadoop: Late binding
Page 14 DATA SERVICES OPERATIONAL SERVICES HORTONWORKS DATA PLATFORM HADOOP CORE WEB LOGS, CLICK STREAMS MACHINE GENERATED OLTP Data Mart / EDW Client Apps Dynamically Apply Transforma8ons Hortonworks HDP With tradi;onal ETL, structure must be agreed upon far in advance and is diﬃcult to change. With Hadoop, capture all data, structure data as business need evolve. WEB LOGS, CLICK STREAMS MACHINE GENERATED OLTP ETL Server Data Mart / EDW Client Apps Store Transformed Data

© Hortonworks Inc. 2013 Big Data Exploration & Visualization Page
15 DATA SYSTEMS DATA SOURCES Refine Explore Enrich APPLICATIONS 1 Capture Capture all data Process Parse, cleanse, apply structure & transform Exchange Explore and visualize with analytics tools supporting Hadoop 2 3 Collect data and perform iterative investigation for value 3 2 TRADITIONAL REPOS RDBMS EDW MPP 1 Business Analy;cs Tradi;onal Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applica;ons Enterprise Applica;ons

© Hortonworks Inc. 2013 Visualization Tooling • Robust visualization and business
tooling • Ensures scalability when working with large datasets Page 16 Native Excel support Web browser support Mobile support

© Hortonworks Inc. 2013 Application Enrichment Page 17 DATA SYSTEMS
DATA SOURCES Refine Explore Enrich APPLICATIONS 1 Capture Capture all data Process Parse, cleanse, apply structure & transform Exchange Incorporate data directly into applications 2 3 Collect data, analyze and present salient results for online apps 3 1 2 TRADITIONAL REPOS RDBMS EDW MPP Tradi;onal Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applica;ons Enterprise Applica;ons NOSQL

© Hortonworks Inc. 2013 Goal: Enhance Hive for BI Use
Cases Page 18 Enterprise Reports Dashboard / Scorecard Parameterized Reports Visualization Data Mining Batch Interactive More SQL & Better Performance

© Hortonworks Inc. 2013 Interoperating With Your Tools Page 19
APPLICATIONS DATA SYSTEMS TRADITIONAL REPOS DEV & DATA TOOLS OPERATIONAL TOOLS Viewpoint Microsoft Applications DATA SOURCES MOBILE DATA OLTP, POS SYSTEMS Tradi;onal Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)

© Hortonworks Inc. 2013 Enhancing the Core of Apache Hadoop
Page 21 HADOOP CORE PLATFORM SERVICES Enterprise Readiness HDFS YARN (in 2.0) MAP REDUCE Deliver high-scale storage & processing with enterprise-ready platform services Unique Focus Areas: •  Bigger, faster, more flexible Continued focus on speed & scale and enabling near-real-time apps •  Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release •  Enterprise-ready services High availability, disaster recovery, snapshots, security, …

© Hortonworks Inc. 2013 Page 22 HADOOP CORE DATA
SERVICES Distributed Storage & Processing PLATFORM SERVICES Enterprise Readiness Data Services for Full Data Lifecycle WEBHDFS HCATALOG HIVE PIG HBASE SQOOP FLUME Provide data services to store, process & access data in many ways Unique Focus Areas: •  Apache HCatalog Metadata services for consistent table access to Hadoop data •  Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools •  Apache HBase NoSQL database for Hadoop •  WebHDFS Access Hadoop files via scalable REST API •  Talend Open Studio for Big Data Graphical data integration tools

© Hortonworks Inc. 2013 HCatalog Table access Aligned metadata REST
API •  Raw Hadoop data •  Inconsistent, unknown •  Tool specific access Apache HCatalog provides flexible metadata services across tools and external access Metadata Service & Table-level Abstractions •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API Shared table and schema management opens the platform Page 23

© Hortonworks Inc. 2013 Operational Services for Ease of Use
Page 24 OPERATIONAL SERVICES DATA SERVICES Store, Process and Access Data HADOOP CORE Distributed Storage & Processing PLATFORM SERVICES Enterprise Readiness OOZIE AMBARI Include complete operational services for productive operations & management Unique Focus Area: •  Apache Ambari: Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues

© Hortonworks Inc. 2013 Apache Ambari Dashboard HDP 1.2:
New Ambari Features •  Job Diagnostics Visualize and troubleshoot Hadoop job execution and performance •  Cluster History View historical job execution & performance •  Instant Insight View health of Core Hadoop (HDFS, MapReduce) and related projects •  Cluster Navigation “Quick link” buttons jump into namenode web UI for a server •  REST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint Page 25

© Hortonworks Inc. 2013 Hortonworks Process for Enterprise Hadoop Page
27 Upstream Community Projects Downstream Enterprise Product Hortonworks Data Platform Design & Develop Distribute Integrate & Test Package & Certify Apache HCatalog Apache Pig Apache HBase Other Apache Projects Apache Hive Apache Ambari Apache Hadoop Test & Patch Design & Develop Release No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream Stable Project Releases Fixed Issues

© Hortonworks Inc. 2013 OS Cloud VM
Appliance Page 28 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Process and Access Data Enterprise Readiness Only Hortonworks allows you to deploy seamlessly across any deployment option •  Linux & Windows •  Azure, Rackspace & other clouds •  Virtual platforms •  Big data appliances HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Deployable Across a Range of Options

© Hortonworks Inc. 2013 What is a Data Driven Business?
•  DEFINITION Better use of available data in the decision making process •  RULE Key metrics derived from data should be tied to goals •  PROVEN RESULTS Firms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology* * “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011) 1110010100001010011101010100010010100100101001001000010010001001000001000100000100 0100100100010000101110000100100010001010010010111101010010001001001010010100100111 11001010010100011111010001001010000010010001010010111101010011001001010010001000111

© Hortonworks Inc. 2013 Page 31 Hortonworks & Teradata • 
Viewpoint Integration –  Common management console for Aster, Teradata and Apache Hadoop •  TVI: Teradata Vital Infrastructure –  Proactive reliability, availability, and manageability support service •  Aster Connector for Hadoop –  SQL-H integration •  Teradata Connector for Hadoop –  Sqoop integration •  Pre-tuned HDFS and MapReduce parameters for Big Data workloads •  Unified Data Architecture –  The right technology on the right analytical problems using best of breed technologies SQL-‐H SQL-‐H Aster-‐Teradata Connector Aster Connector for Hadoop Teradata Connector for Hadoop

An Introduction to Hadoop within the Teradata UDA

An Introduction to Hadoop within the Teradata UDA

cj_harris5

More Decks by cj_harris5

Featured

Transcript

© Hortonworks Inc. 2013 Chris Harris Twitter : cj_harris5 E-mail

© Hortonworks Inc. 2013 What is Big Data? Page 2

© Hortonworks Inc. 2013 Web giants proved the ROI in

© Hortonworks Inc. 2013 Key use-cases in Finance/Insurance • Customer risk

© Hortonworks Inc. 2013 Key use-cases in Telco/Mobile • Customer life-time-value

© Hortonworks Inc. 2013 Key use-cases in Healthcare • Clinical Decision

© Hortonworks Inc. 2013 Data science is a natural next

AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS

AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS

© Hortonworks Inc. 2013 What is Hadoop? Page 10

© Hortonworks Inc. 2013 A Brief History of Apache Hadoop

© Hortonworks Inc. 2013 Leadership that Starts at the Core

© Hortonworks Inc. 2013 Operational Data Refinery Page 13 DATA

© Hortonworks Inc. 2013 Key Capability in Hadoop: Late binding

© Hortonworks Inc. 2013 Big Data Exploration & Visualization Page

© Hortonworks Inc. 2013 Visualization Tooling • Robust visualization and business

© Hortonworks Inc. 2013 Application Enrichment Page 17 DATA SYSTEMS

© Hortonworks Inc. 2013 Goal: Enhance Hive for BI Use

© Hortonworks Inc. 2013 Interoperating With Your Tools Page 19

© Hortonworks Inc. 2013 Hadoop Components Page 20

© Hortonworks Inc. 2013 Enhancing the Core of Apache Hadoop

© Hortonworks Inc. 2013 Page 22 HADOOP CORE DATA

© Hortonworks Inc. 2013 HCatalog Table access Aligned metadata REST

© Hortonworks Inc. 2013 Operational Services for Ease of Use

© Hortonworks Inc. 2013 Apache Ambari Dashboard HDP 1.2:

© Hortonworks Inc. 2013 From Community to the Enterprise Page

© Hortonworks Inc. 2013 Hortonworks Process for Enterprise Hadoop Page

© Hortonworks Inc. 2013 OS Cloud VM

© Hortonworks Inc. 2013 Becoming Data Driven Page 29

© Hortonworks Inc. 2013 What is a Data Driven Business?

© Hortonworks Inc. 2013 Page 31 Hortonworks & Teradata •

© Hortonworks Inc. 2013 Thank You! Questions & Answers Page