Applied machine learning at facebook a datacenter infrastructure perspective HPCA18
Research Paper introduction to Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective.
Facebook MLaaS and Datacenter Design for Machine Learning.
Applied Machine Learning at Facebook : A Datacenter Infrastructure Perspective International Symposium on High-Performance Computer Architecture (HPCA) 18 I Shunya Ueta (@hurutoriya) 2018-03-07
Abstract “This paper describes the hardware and software infrastructure that supports machine learning at global scale.” 2.1 billion Users served Machine Learning (ML-as-a- Service). Ranking posts for News Feed, Speech and Text Translations,and Photo and Real-time Video Classification FAIR System & Network
What’s Contribution? ● MLaaS, Computer Vision represents only a small fraction of the resource requirements. ● FB relies upon an incredibly diverse set of ML approaches. ○ e.g. SVM, GBDT,Logistic Regression(LR) ● Inference used mainly CPU, Training used CPU and GPU.
Major Services Leveraging Machine Learning 1. News Feed : Ranking Alg. Almost user visit for News Feed. 2. Ads: ML to determine which ads to display to a given user a. “Practical lessons from predicting clicks on ads at facebook,” ADKDD14 3. Search : Videos, Photos, People, Events, etc. 4. Sigma : is the general classification and anomaly detection framework 5. Lumos : high-level attributes and embeddings from an image and its content 6. Facer : Facebook’s face detection and recognition framework. 7. Language Translation : Support translations for more than 45 languages. [link] 8. Speech Recognition : provides automated captioning for video
DNN Framework ● PyTorch is optimized for research. Focuses on flexibility, debugging, and dynamic neural which ena enbles rapid experimentation. Not optimized for production and mobile deployments. ● Caffe2 iis optimized for production. Performance, Cross-platform Support, and coverage for CNN,RNN,MLP Third party package can use cuDNN, MKL, and Metal
Future of MLaaS at Facebook ● ML workloads benefit from SIMD, specialized convolution or matrix multiplication engines. ● Model compression, Quantization, and High-bandwidth memory ○ "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, ICLR16 Song Han et al ○ "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1" Matthieu Courbariaux et al. ○ “Ternary Neural Networks for Resource-Efficient AI Applications” Hande Alemdar et al. ● Relational Work : ○ "TFX: A TensorFlow-Based Production-Scale Machine Learning Platform" KDD17
Conclusion ● 2.1 billion Users served MLaaS at Facebook!! ● MLaaS, Computer Vision represents only a small fraction of the resource requirements. ● FB relies upon an incredibly diverse set of ML approaches. ○ e.g. SVM, GBDT,Logistic Regression(LR) ● Inference used mainly CPU, Training used CPU and GPU.