• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 2
A manufacturing system, or a factory, is a huge container of materials. • However, they are not simply kept in the system, but being transformed from raw materials to finished products step by step. • So, a manufacturing system is also a huge human-machine working system composed of various resources. • The resources are carrying out various steps of the whole transformation process of materials, which we call operations. • From a functional point of view, the system can also be conceptualized as a collection of those transforming operations. 4
can be classified into value-added and non-value-added ones. – Operations are deemed value-added if they are indispensable for transforming raw materials into final products. • Most operations are carried out on materials, but some are not. • However, even some operations performed on materials can be non-value-added (changing locations, lot/batch formations, etc.). • Carrying out an operation takes time, and requires necessary resources (and materials). 6
• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 7
• It is to have its resources carry out operations so that raw materials are transformed into finished products in an organized manner. • To do this, it needs to be determined what operations should be performed when by which resources (on which materials). • From production and operations management point of view, it is this decision that is essential and critical, because it affects: – Production lead-time, – Utilization of resources, – Inventory levels, – Amount of non-value added operations, etc. 8
The result can be shown and evaluated with the ex-post production schedule, which shows what operations were actually performed when by which resources (on which materials). • This ex-post schedule is only available retrospectively, and cannot be fully designed or optimized beforehand as an ex-ante schedule. – Uncertainties (forecast distributions vs. realized values) – Situatedness and locality (central control vs. autonomousness) • Central scheduling also often suffers from: – Combinatorial explosion – Incompatibility with structural changes/improvements (Kaizen) 9
system- wide planning Local distributed decisions made autonomously Offline planning made in advance based on forecast distribution of uncertainties Central/offline planning Online real-time decisions made when necessary based on the actual realized situation Local/online decisions 10
Local/online decisions Goals, constraints, other system-wide information Actual progress, other situational information Real-time autonomous operational policies Rough-sketch system- wide optimization
• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 12
Crane Several kinds of jobs arrive randomly, and a due date is assigned to each of them. A setup operation is necessary, when changing job kinds to be processed on the machine. The jobs can be moved with a crane from the entrance to ,and between multiple LIFO buffers and a machine. LIFO: last in first out 16
Non time- consuming policy in a dynamic environment Decision, action Input variables on how current situation is captured Locally observed and communicated information An operational policy can be captured as a transformation from input to output.
18 System state: s0 System state: s2 System state: s3 System state: s5 System state: s1 Revealed uncertainty (roll a dice) Operational decision (made by the policy) System state changes according to revealed uncertainty and operational decision made by the policy.
& back orders Stock level & back orders Stock level & back orders Stock level & back orders Stock level & back orders How many items are demanded in this period How many to order at this period System state includes the stock level and the list of back orders, and changes according to revealed demand quantity and ordering decision made by the policy.
• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 21
Q-table In each state, take the action which gives the maximum Q value in the corresponding row. When state is parameterized as = (% , ' , … ), Q-table can be generalized to a Q-function , = (, ). The famous DQN uses a deep neural network (a deep leaning model) for approximating this function. 25 a1 a2 a3 … s1 s2 s3 s4 s5 …
• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 26
of items including back orders Actions How many to order? Transition Dependent on uncertain demand from downstream Cost (to be minimized) Ordering cost, holding cost, shortage penalty 27 15 20 policy 30 transition cost 0 policy 23 transition cost 0 policy
• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 31
Naive reinforcement learning becomes inefficient for obtaining a complex operational policy, especially in a multi-agent setting. • Skillful operational practices provide valuable data for streamlining such a tedious learning process. • That is, machine agents can learn form human specialists. • Serious games can be used as a tool for collecting virtual operational data efficiently. • They may also be used, the other way around, as a tool for training operational skills to novices. 32
• What is to operate a manufacturing system? • How an operational policy can be modeled and tested? • What is reinforcement learning? • How to refine an operational policy by reinforcement learning? • How to learn from informal operational skills? • What did you learn today? 34