code base is integrated within the rest of the backend code base. The ML service code base is deployed on a single server, with elastic load balancing for scaling. The ML service code base is deployed such that components get their own services. The entire system process is slowed down by the ML service, the model size and computation requirements usually add additional load on the backend servers. Usually considered if the inference process is very light to run. The model size can be complex without putting load pressure on the rest of the infrastructure. This is typically the easiest way to deploy a model while ensuring scalability, maintainability and reliability. This is a relief system for the entire codebase. It ensures the different components of the ML system can be reused for different purposes. For example, the ML inference manager at RadioAdSpread. www.radioadspread.com