Applied machine learning at facebook a datacenter infrastructure perspective HPCA18
Research Paper introduction to Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective.
Facebook MLaaS and Datacenter Design for Machine Learning.
supports machine learning at global scale.” 2.1 billion Users served Machine Learning (ML-as-a- Service). Ranking posts for News Feed, Speech and Text Translations,and Photo and Real-time Video Classification FAIR System & Network
fraction of the resource requirements. • FB relies upon an incredibly diverse set of ML approaches. ◦ e.g. SVM, GBDT,Logistic Regression(LR) • Inference used mainly CPU, Training used CPU and GPU.
Alg. Almost user visit for News Feed. 2. Ads: ML to determine which ads to display to a given user a. “Practical lessons from predicting clicks on ads at facebook,” ADKDD14 3. Search : Videos, Photos, People, Events, etc. 4. Sigma : is the general classification and anomaly detection framework 5. Lumos : high-level attributes and embeddings from an image and its content 6. Facer : Facebook’s face detection and recognition framework. 7. Language Translation : Support translations for more than 45 languages. [link] 8. Speech Recognition : provides automated captioning for video
flexibility, debugging, and dynamic neural which ena enbles rapid experimentation. Not optimized for production and mobile deployments. • Caffe2 iis optimized for production. Performance, Cross-platform Support, and coverage for CNN,RNN,MLP Third party package can use cuDNN, MKL, and Metal
SIMD, specialized convolution or matrix multiplication engines. • Model compression, Quantization, and High-bandwidth memory ◦ "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, ICLR16 Song Han et al ◦ "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1" Matthieu Courbariaux et al. ◦ “Ternary Neural Networks for Resource-Efficient AI Applications” Hande Alemdar et al. • Relational Work : ◦ "TFX: A TensorFlow-Based Production-Scale Machine Learning Platform" KDD17
MLaaS, Computer Vision represents only a small fraction of the resource requirements. • FB relies upon an incredibly diverse set of ML approaches. ◦ e.g. SVM, GBDT,Logistic Regression(LR) • Inference used mainly CPU, Training used CPU and GPU.