Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Embracing automation - An autoscaler mechanism by using Saltstack and Prometheus

JackyTung
September 12, 2018

Embracing automation - An autoscaler mechanism by using Saltstack and Prometheus

Autoscaling 是藉由監控服務流量來調整機器數量的一種機制,
可以盡量的用最低的成本來維持穩定的服務。
現有的cloud service, 例如AWS 對於自家的運算執行個體(EC2),
也有提供 Autoscaling 的機制,
但此機制對執行個體的增長刪減操作,不滿足我們的需求。

如何藉由 Prometheus (監控服務流量) 與 Saltstack (組態管理工具) 來實踐Autoscaling ?
為什麼會選擇 Prometheus 來作為監控服務流量的工具?
以及為什麼會選擇 Saltstack?
在設計 Autoscaling 的觸發條件上有什麼需要注意的?
還有如何搭起 Prmoetheus 與 Saltstack 之間的橋樑。

JackyTung

September 12, 2018
Tweet

Other Decks in Technology

Transcript

  1. • Software engineer - HTC DeepQ • AI Platform •

    Web Development • Github: https://github.com/JackyTung • Slides: https://speakerdeck.com/jackytung About Me
  2. About DeepQ AI Platform https://ai-platform.deepq.com/ • Purpose: • Lower the

    barrier of AI training • Feature • Simplified model development process • Auto hyper-parameter tuning • Optimized training environment
  3. • Maintain dev/sta/prod environment • No. Of Servers to be

    monitor increases • Diagnose & provide feedback • Automate infrastructure management Operation Challenges
  4. • Maintain dev/sta/prod environment • No. Of Servers to be

    monitor increases • Diagnose & provide feedback • Automate infrastructure management • GPU instances take a large proportion of cost Operation Challenges DeepQ AI Platform GPU instance Client Side Training Task
  5. Candidate solutions • Use existing cloud service solutions (e.g AWS,

    GCP) • problem: increase deployment time • terminate instances when instances are idle • Implement by ourselves • Reduce deployment time: reserve instances • Scalability: can apply on existing cloud service instances
  6. TARGET INSTANCES SALT MASTER Execute salt command to scale-up or

    scale-down instance Autoscaler salt-api AlertManager notify pull metrics Overall architecture Prometheus
  7. TARGET INSTANCES SALT MASTER Execute salt command to scale-up or

    scale-down instance Autoscaler salt-api AlertManager notify pull metrics Prometheus
  8. Metric based vs Log based ref: https://signalfx.com/blog/metric-log-monitoring-really-need/ Typically, metrics are

    best used for monitoring, profiling, and alerting Logs give you the extra level of detail necessary for troubleshooting, debugging, support, and auditing
  9. Metric based vs Log based Metrics Log Exact counter X

    O Error cause X O Network Bandwidth Const amount (e.g. once per 15s ) Linear to #event Storage Usage Small (just sampled numbers) Large (event details) Detect Incidents O O
  10. Why Prometheus • A metric-based monitor system • Focus on

    time series data monitoring • We care about metrics like • queue length • waiting time • number of free instances • error count • CPU, MEM, disk usage
  11. TARGET INSTANCES SALT MASTER Execute salt command to scale-up or

    scale-down instance Autoscaler salt-api AlertManager notify pull metrics Prometheus
  12. What is autoscaler scalecondition ( 0, 5, 0, 1, 270)

    #instance workload for _, env := range []string{‘dev’, ‘sta’, ‘prod’}
 for _, instanceType := range []string{‘type1’, ‘type2’} 5 threshold maximum # of instance minimum # of instance down threshold up threshold 0 1 Interval (secs)
  13. TARGET INSTANCES SALT MASTER Execute salt command to scale-up or

    scale-down instance Autoscaler Send scale command
  14. • A configuration management tool • Flexible, Scalable to maintain

    10,000 of machines • A remote execution framework • Master / Agent • Parallel execution • Secure • Salt minion key authentication What is Saltstack ?
  15. TARGET INSTANCES SALT MASTER Scale-up Scale up command will be

    like this …. salt-run cloud.action start {target instance} It just starts instance , How to deploy ? Use Saltstack Event-driven System !
  16. • Everything you care about • authentication, minion start, job

    events, cloud event …. • Event types: https://docs.saltstack.com/en/latest/topics/event/ master_events.html • The structure of event : Tag + Data What is Event? Ref: https://docs.saltstack.com/en/latest/topics/event/master_events.html
  17. • An Event-driven infrastructure • Event System: fire off events

    enabling third party applications or external process to react to behavior with salt • Reactor System: trigger actions in response to an event Event and Reactor Ref: https://docs.saltstack.com/en/getstarted/overview.html 1 2 3
  18. • The autoscale mechanism can apply on non-container based instance

    • Before survey new techniques, think purpose first • Can services be containerlized? • Do the existing solutions meet our requirements? • Monitoring: metrics-based or log-based ? • Configuration Management Tool: Ansible, Chef, Puppet, Saltstack ..etc Summary