Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using OpenStack Orchestration for Big Data Workloads

Hart Hoover
January 30, 2017

Using OpenStack Orchestration for Big Data Workloads

Talk on OpenStack Orchestration (Project Heat) using the Enterprise working group's Big Data reference architecture. Demo deployed on Cisco Metacloud

Hart Hoover

January 30, 2017
Tweet

More Decks by Hart Hoover

Other Decks in Technology

Transcript

  1. 2.5 Quintillion bytes of data is generated every day, most

    of which is never captured, never collected, with no corresponding action taken. 2,500,000,000,000,000,000 Source: IBM/Cisco
  2. By year-end 2018, 25% of durable good manufacturers will utilize

    data generated by smart machines in their customer-facing sales, billing, and service workflows. And by 2018, 6 billion “Things” will request support. 6 Billion Things Source: Gartner
  3. Data & Analytics Use Cases MEDIA/ ENTERTAINMENT Viewers / advertising

    effectiveness COMMUNICATIONS Location-based advertising EDUCATION & RESEARCH Experiment sensor analysis CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, problems HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LIFE SCIENCES Clinical trials Genomics HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg. quality Warranty analysis OIL & GAS Drilling exploration sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products AUTOMOTIVE Auto sensors reporting location, problems RETAIL Consumer sentiment Optimized marketing LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity, ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization
  4. • An Open Source software platform (file system) that process

    vast amounts of data (MapReduce, HBase, HDFS) • Yahoo! and Google are its biggest contributors • Attributes: • Scalable: Store and process petabytes of data • Economical: Processes across clusters of commonly available servers • Efficient: Processes in parallel on the nodes where the data is located • Reliable: Maintains multiple copies of the data • Highly available: Automatically redeploys computing tasks on failures What Is Hadoop?
  5. What is OpenStack Orchestration? OpenStack Dashboard Standard Hardware OpenStack Shared

    Services Your Applications Compute Networking Storage APIs
  6. HOT

  7. •Meta •Parameters •Resources •Outputs parameters: key_name: type: string label: Key

    Name description: Name of keypair to be used for compute instance HOT Sections
  8. •Meta •Parameters •Resources •Outputs outputs: instance_ip: description: IP address of

    the deployed compute instance value: { get_attr: [my_instance, first_address] } HOT Sections
  9. length: (String) { min: <lower limit>, max: <upper limit> }

    range: (number) { min: <lower limit>, max: <upper limit> } allowed_values: [ <value>, <value>, ... ] Parameter Constraints allowed_pattern: <regular expression> custom_constraint