Using OpenStack Orchestration for Big Data Workloads

Hart Hoover | @hhoover Using OpenStack Orchestration for Big Data

Big Data

The Digital Economy is Disrupting Everything

2.5 Quintillion bytes of data is generated every day, most
of which is never captured, never collected, with no corresponding action taken. 2,500,000,000,000,000,000 Source: IBM/Cisco

By year-end 2018, 25% of durable good manufacturers will utilize
data generated by smart machines in their customer-facing sales, billing, and service workflows. And by 2018, 6 billion “Things” will request support. 6 Billion Things Source: Gartner

Data & Analytics Use Cases MEDIA/ ENTERTAINMENT Viewers / advertising
effectiveness COMMUNICATIONS Location-based advertising EDUCATION & RESEARCH Experiment sensor analysis CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, problems HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LIFE SCIENCES Clinical trials Genomics HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg. quality Warranty analysis OIL & GAS Drilling exploration sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products AUTOMOTIVE Auto sensors reporting location, problems RETAIL Consumer sentiment Optimized marketing LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity, ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization

• An Open Source software platform (file system) that process
vast amounts of data (MapReduce, HBase, HDFS) • Yahoo! and Google are its biggest contributors • Attributes: • Scalable: Store and process petabytes of data • Economical: Processes across clusters of commonly available servers • Efficient: Processes in parallel on the nodes where the data is located • Reliable: Maintains multiple copies of the data • Highly available: Automatically redeploys computing tasks on failures What Is Hadoop?

https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf

OpenStack Orchestration (Heat)

What is OpenStack Orchestration? OpenStack Dashboard Standard Hardware OpenStack Shared
Services Your Applications Compute Networking Storage APIs

• python-heatclient || python-openstackclient • heat-api • heat-api-cfn • heat-engine
• (MySQL & RabbitMQ) OpenStack Orchestration Components

OpenStack Orchestration Workflow: POST heat-engine heat-api RabbitMQ MySQL POST

OpenStack Orchestration Workflow: GET heat-engine heat-api RabbitMQ MySQL GET

• Sections • Parameter Constraints • Pseudo Parameters • Intrinsic
Functions HOT Components

•Meta •Parameters •Resources •Outputs heat_template_version: 2015-10-15 description: Deploys a three-tier
web application. HOT Sections

•Meta •Parameters •Resources •Outputs parameters: key_name: type: string label: Key
Name description: Name of keypair to be used for compute instance HOT Sections

•Meta •Parameters •Resources •Outputs resources: my_instance: type: OS::Nova::Server properties: flavor:
m1.small image: ubuntu-14.04 HOT Sections

•Meta •Parameters •Resources •Outputs outputs: instance_ip: description: IP address of
the deployed compute instance value: { get_attr: [my_instance, first_address] } HOT Sections

length: (String) { min: <lower limit>, max: <upper limit> }
range: (number) { min: <lower limit>, max: <upper limit> } allowed_values: [ <value>, <value>, ... ] Parameter Constraints allowed_pattern: <regular expression> custom_constraint

OS::stack_name OS::stack_id OS::project_id Pseudo Parameters

get_attr get_file get_resource list_join resource_facade Intrinsic Functions str_replace digest repeat
str_split map_merge

Use the community as a reference, not for production

https://www.openstack.org/software/sample-configs/#big-data Big Data Deployment

Using OpenStack Orchestration for Big Data Work...

Using OpenStack Orchestration for Big Data Workloads

Hart Hoover

More Decks by Hart Hoover

Other Decks in Technology

Featured

Transcript

Hart Hoover | @hhoover Using OpenStack Orchestration for Big Data

Big Data

The Digital Economy is Disrupting Everything

2.5 Quintillion bytes of data is generated every day, most

By year-end 2018, 25% of durable good manufacturers will utilize

Data & Analytics Use Cases MEDIA/ ENTERTAINMENT Viewers / advertising

• An Open Source software platform (file system) that process

https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf

OpenStack Orchestration (Heat)

What is OpenStack Orchestration? OpenStack Dashboard Standard Hardware OpenStack Shared

• python-heatclient || python-openstackclient • heat-api • heat-api-cfn • heat-engine

OpenStack Orchestration Workflow: POST heat-engine heat-api RabbitMQ MySQL POST

OpenStack Orchestration Workflow: GET heat-engine heat-api RabbitMQ MySQL GET

HOT

• Sections • Parameter Constraints • Pseudo Parameters • Intrinsic

•Meta •Parameters •Resources •Outputs heat_template_version: 2015-10-15 description: Deploys a three-tier

•Meta •Parameters •Resources •Outputs parameters: key_name: type: string label: Key

•Meta •Parameters •Resources •Outputs resources: my_instance: type: OS::Nova::Server properties: flavor:

•Meta •Parameters •Resources •Outputs outputs: instance_ip: description: IP address of

length: (String) { min: <lower limit>, max: <upper limit> }

OS::stack_name OS::stack_id OS::project_id Pseudo Parameters

get_attr get_file get_resource list_join resource_facade Intrinsic Functions str_replace digest repeat

Use the community as a reference, not for production

https://www.openstack.org/software/sample-configs/#big-data Big Data Deployment