LQR Learning Pipelines

Florian Dörfler
September 25, 2024

LQR Learning Pipelines

RantzerFest ECC 2024

Florian Dörfler

September 25, 2024


  2. 2 Revisiting old problems with old tools in a new

  3. 3 Data-driven pipelines • indirect (model-based) approach: data → model

    + uncertainty → control • direct (model-free) approach: direct MRAC, RL, behavioral, … ID ? x+ = f(x, u) y = h(x, u) y u • episodic & batch algorithms: collect batch of data → design policy • online & adaptive algorithms: measure → update policy → actuate well-documented trade-offs concerning • complexity: data, compute, & analysis • goal: optimality vs (robust) stability • practicality: modular vs end-to-end … → gold(?) standard: direct, adaptive, optimal yet robust, cheap, & tractable
  4. 4 LQR • cornerstone of automatic control • parameterization (can

    be posed as convex SDP, as differentiable program, as… ) • the benchmark for all data-driven control approaches in last decades but there is no direct & adaptive LQR Here, we view the LQR problem as a H2 -optimizatio problem as our method is based on the minimization of (3 As shown in [34, Section 6.4], the controller that min mizes the H2 -norm of T (K) (henceforth, optimal) is uniq and can be computed by solving a discrete-time Riccati equ tion [1]. Alternatively, following [35], this optimal controll can be determined by solving the following program: minimize P ⌫ I, K trace (QP) + trace K>RKP subject to (A + BK)P(A + BK)> P + I 0 , ( 1Given a stable p ⇥ m transfer function T ( ) in the indeterminate the squared H2-norm of T ( ) is defined as [34, Section 4.4]: kT k 2 2 := 1 2⇡ Z 2⇡ 0 trace(T (e j✓)0T (e j✓)) d✓ <latexit sha1_base64="I1Bd00v/sUeUqbovcB7narElkvo=">AAAEE3icdZJNb9MwGMe9hpcRXtbBkYtFNYRUVJJqAg4gbQMhjhvQbVKTVY7jtNbsONhO187Kx+AL8DU4ICEOXDjAjW+D03SItJ2lqH/9n5+ftzrKGFXa8/6sNZwrV69dX7/h3rx1+85Gc/PuoRK5xKSHBRPyOEKKMJqSnqaakeNMEsQjRo6i01dl/GhMpKIi/aCnGQk5GqY0oRhpaw2aLyYnbfjwJdydwDbcgzlsxzCI6JB13CDod7qEh+55CRycGP9Jtyixd5WE+aDZ8jre7MBl4c9FC8zP/mCz8TWIBc45STVmSKm+72U6NEhqihkp3CBXJEP4FA1J38oUcaIex2OaqZkMzWziAm7ZYAwTIe2Xajhz/79sEFdqyiNLcqRHajFWmqti/Vwnz0ND0yzXJMVVoSRnUAtYrg/GVBKs2dQKhCW1bUM8QhJhbZdcq3KG1NS2UJvJlAW1EEwVqztaPUPdxupjLjRZTlGuQi3XkypZ4cZikY1nCereJLGjFa67Bcd2bFGO+JrYf06S97Yzwd7YGyayIi5Mr7hQvDBpsYLcZdkIRUSboOxgDlc/bpCSMyw4R2lsAkV5xsik6PuhsWmYRgPT8osFqmypQv6lu4QSUqR2YZbth5Vj/OKylEKeEynqtHdB2yfvLz7wZXHY7fhPO9sH262dvfnjXwf3wQPwCPjgGdgBb8E+6AEMPoMf4Bf47XxyvjjfnO8V2lib37kHasf5+RfhcGHF</latexit> x+ = Ax + Bu + d z = Q1/2x + R1/2u <latexit sha1_base64="qyAgEB6Cap3eADXzj2QpmQdS1+w=">AAAD1nicdVJNb9NAEN3GQIv5auHIxSKqxAFFNqoKxxYQQuLSCtJGiq1ovR43q+6H2V2nDavlhjhw4QC/h9/Bv2GdpAgn6UorP715M/NmvHnFqDZx/GejE9y4eWtz63Z45+69+w+2dx6eaFkrAn0imVSDHGtgVEDfUMNgUCnAPGdwmp+/buKnE1CaSvHRTCvIOD4TtKQEG08dvx9td+NePDvRKkgWoIsW52i00/mdFpLUHIQhDGs9TOLKZBYrQwkDF6a1hgqTc3wGQw8F5qCfFRNa6RnM7My0i3Z9sIhKqfwVJpqx/ydbzLWe8twrOTZjvRxryHWxYW3Kl5mloqoNCDJvVNYsMjJqNhAVVAExbOoBJop62xEZY4WJ8XtqdbnAeuottGayTUMjJdNuvaP1M7Rpoj/V0sBqiWYVerWf0uUatpDL2mJWoM1dln40F4a70cSPLZsR34D/cwo+eGeSvfUZNvegcLbvrhB3Vrg1ykNWjXEOxqaNg4V4/glTARdEco5FYVNNecXg0g2TzPoyzOCR7SZuSdVYmkv+lbtGJZUUfmFeO8zmjE3cdSWl+gxKttXxldo/+WT5ga+Ck+e9ZL+3d7zXPXi1ePxb6DF6gp6iBL1AB+gdOkJ9RBCg7+gn+hUMgi/B1+DbXNrZWOQ8Qq0T/PgLE2lPPQ==</latexit> K <latexit sha1_base64="piOdVFb+suGtja5zRyazyli3xHU=">AAAD1nicdVJNb9NAEN3GQIv5auHIxSKqxAFFNqqgx/IhxLEVpI2UWNF6PW5W3Q+zu04TVssNceDCAX4Pv4N/w9pJEU7SlVZ+evNm5s14s5JRbeL4z1YnuHHz1vbO7fDO3Xv3H+zuPTzVslIE+kQyqQYZ1sCogL6hhsGgVIB5xuAsu3hTx8+moDSV4qOZl5ByfC5oQQk2njqZjXe7cS9uTrQOkiXoouU5Hu91fo9ySSoOwhCGtR4mcWlSi5WhhIELR5WGEpMLfA5DDwXmoJ/lU1rqBqa2Me2ifR/Mo0Iqf4WJGvb/ZIu51nOeeSXHZqJXYzW5KTasTHGYWirKyoAgi0ZFxSIjo3oDUU4VEMPmHmCiqLcdkQlWmBi/p1aXS6zn3kJrJls3NFIy7TY72jxDmyb6UyUNrJeoV6HX+yldbGBzuarNmwJtblb40VwY7kdTP7asR3wL/s8p+OCdSfbOZ9jMg9zZvrtC3FnhNihfsXKCMzB2VDtYihefcCTgkkjOscjtSFNeMpi5YZJaX4YZPLbdxK2oaksLyb9y16ikksIvzGuH6YKxibuupFSfQcm2Or5S+yefrD7wdXD6vJe86B2cHHSPXi8f/w56jJ6gpyhBL9EReo+OUR8RBOg7+ol+BYPgS/A1+LaQdraWOY9Q6wQ//gKuc09q</latexit> x <latexit sha1_base64="Lv3+tuowl3jJhkKk2/17JNUxXL8=">AAAD1nicdVLLbtNAFJ3GPEp4tbBkYxFVYoEiG1XAsjyEWLaCtJESKxqPr5tR52FmxmnDaNghFmxYwPfwHfwN40cRTtKRRj4699x7z72etGBUmyj6s9ULrl2/cXP7Vv/2nbv37u/sPjjWslQERkQyqcYp1sCogJGhhsG4UIB5yuAkPXtTxU8WoDSV4qNZFpBwfCpoTgk2njrKZjuDaBjVJ1wHcQsGqD2Hs93e72kmSclBGMKw1pM4KkxisTKUMHD9aamhwOQMn8LEQ4E56KfZgha6homtTbtwzwezMJfKX2HCmv0/2WKu9ZKnXsmxmevVWEVuik1Kk79MLBVFaUCQplFestDIsNpAmFEFxLClB5go6m2HZI4VJsbvqdPlHOult9CZyVYNjZRMu82ONs/QpYn+VEoD6yWqVej1fkrnG9hMrmqzukCXu8j9aK7f3wsXfmxZjfgW/J9T8ME7k+ydz7CpB5mzI3eJuLPCbVC+YsUcp2DstHLQiptPfyrgnEjOscjsVFNeMLhwkzixvgwzeGYHsVtRVZYayb9yV6ikksIvzGsnScPY2F1VUqrPoGRXHV2q/ZOPVx/4Ojh+NoyfD/eP9gcHr9vHv40eocfoCYrRC3SA3qNDNEIEAfqOfqJfwTj4EnwNvjXS3lab8xB1TvDjL2mLT1Y=</latexit> d <latexit sha1_base64="+8d3LqPbdMShS7E2E/8NBAHBbFA=">AAAD23icdVLLbhMxFHUzPEp4tbBkYxFVYoGiGVQBG6TyEEJiUwRJKyWjyOO505j6MdieNMHyih1iwYYF/Azfwd/gSVLEJKkla47OPffec+84KzkzNo7/bLWiS5evXN2+1r5+4+at2zu7d/pGVZpCjyqu9HFGDHAmoWeZ5XBcaiAi43CUnb6s40cT0IYp+cHOSkgFOZGsYJTYQPUr/Ay/nY52OnE3nh+8DpIl6KDlORzttn4Pc0UrAdJSTowZJHFpU0e0ZZSDbw8rAyWhp+QEBgFKIsA8zCesNHOYurlzj/dCMMeF0uFKi+fs/8mOCGNmIgtKQezYrMZqclNsUNniaeqYLCsLki4aFRXHVuF6DThnGqjlswAI1SzYxnRMNKE2LKvR5YyYWbDQmMnVDa1S3PjNjjbP0KSp+VQpC+sl6lWY9X7aFBvYXK1q83mBJjctwmi+3d7DkzC2qkd8BeHPaXgfnCn+OmS4LIDcu54/R8I76Tcon/NyTDKwblg7WIoXn/ZQwhlVQhCZu6FhouQw9YMkdaEMt2TkOolfUdWWFpJ/5S5QKa1kWFjQDtIF4xJ/UUmlP4NWTXV8rg5PPll94Oug/6ibPO7uv9vvHLxYPv5tdA/dRw9Qgp6gA/QGHaIeougj+o5+ol9RGn2JvkbfFtLW1jLnLmqc6MdfuSFQ2Q==</latexit> u = Kx <latexit sha1_base64="BWEyt67wYSeIqMhk+9RgiODFFY8=">AAAD1nicdVJNb9NAEN3GQIv5auHIxSKqxAFFNqoKxxYQ4tgK0kZKrGi9Hjer7ofZXadNV8sNceDCAX4Pv4N/w9pJEU7SlVZ+evNm5s14s5JRbeL4z0YnuHX7zubW3fDe/QcPH23vPD7RslIE+kQyqQYZ1sCogL6hhsGgVIB5xuA0O39bx0+noDSV4pOZlZByfCZoQQk2njq+Gm93417cnGgVJAvQRYtzNN7p/B7lklQchCEMaz1M4tKkFitDCQMXjioNJSbn+AyGHgrMQb/Ip7TUDUxtY9pFuz6YR4VU/goTNez/yRZzrWc880qOzUQvx2pyXWxYmeJ1aqkoKwOCzBsVFYuMjOoNRDlVQAybeYCJot52RCZYYWL8nlpdLrCeeQutmWzd0EjJtFvvaP0MbZroz5U0sFqiXoVe7ad0sYbN5bI2bwq0ucvCj+bCcDea+rFlPeI78H9OwUfvTLL3PsNmHuTO9t014s4Kt0Z5yMoJzsDYUe1gIZ5/wpGACyI5xyK3I015yeDSDZPU+jLM4LHtJm5JVVuaS/6Vu0EllRR+YV47TOeMTdxNJaW6AiXb6vha7Z98svzAV8HJy16y39s73usevFk8/i30FD1Dz1GCXqED9AEdoT4iCNB39BP9CgbBl+Br8G0u7Wwscp6g1gl+/AW1V09s</latexit> z <latexit sha1_base64="/nJGBkDqYFj8CQhqtsFz40MITxw=">AAAD4XicdVLLbtQwFHUnPMrwamHJJmJUiQUaJVVVWJaHUJdFMG2lJBo5zk3Hqh/BdqYdLH8AO8SCDQv4E76Dv8HJTBGZmVqycnTu8b3n3ty8YlSbKPqz0Qtu3Lx1e/NO/+69+w8ebm0/OtayVgRGRDKpTnOsgVEBI0MNg9NKAeY5g5P8/E0TP5mC0lSKj2ZWQcbxmaAlJdh4Kk05NhOCWXg43h1vDaJh1J5wFcQLMECLczTe7v1OC0lqDsIQhrVO4qgymcXKUMLA9dNaQ4XJOT6DxEOBOejnxZRWuoWZbf27cMcHi7CUyl9hwpb9/7HFXOsZz72y8auXYw25LpbUpnyZWSqq2oAg80JlzUIjw2YYYUEVEMNmHmCiqLcdkglWmBg/sk6VC6xn3kKnJ9sUNFIy7dY7Wt9Dlyb6Uy0NrKZoRqFX6yldrmELuawt2gRd7rL0rbl+fyec+rZl0+Jb8H9OwQfvTLJ3/oXNPSicHbkrxJ0Vbo3yFasmOAdj2yVaiOeffirggkjOsShsqimvGFy6JM6sT8MMHttB7JZUjaW55F+6a1RSSeEH5rVJNmds7K5LKdVnULKrjq7UfuXj5QVfBce7w3h/uPd+b3DwerH8m+gJeoqeoRi9QAfoEB2hESKoQt/RT/QrIMGX4GvwbS7tbSzePEadE/z4C0CTU48=</latexit> H2 indirect direct online adaptive offline batch
  5. 5 Contents 1. model-based pipeline with model-free elements → data-driven

    parametrization & robustifying regularization 2. model-free pipeline with model-based elements → adaptive method: policy gradient & sample covariance 3. case studies: academic & power systems/electronics → LQR is academic example but can be made useful
  6. 6 Subspace relations in state-space data, ordinary least- are identification,

Le W0 :=  U0 . <latexit sha1_base64="d4EGPxJ9UdqpcvByZtIYW/s8FSQ=">AAAD+XicdVJNb9MwGPYaPkb46uDIxVo1hASqkmkCLkjbmBDHIehWqYkix3Faa/4IttOtWLnzN7ghDlw4wI3fwb/BSTtE2s6S40fP+7yfedOCUW2C4M9Gx7t2/cbNzVv+7Tt3793vbj040bJUmAywZFINU6QJo4IMDDWMDAtFEE8ZOU3PXtf20ylRmkrxwcwKEnM0FjSnGBlHJd3tYRLCx6/gARwmAXwKD+GgeY/cN0rpmPX9pNsL+kFz4CoIF6AHFuc42er8jjKJS06EwQxpPQqDwsQWKUMxI5UflZoUCJ+hMRk5KBAn+lk2pYVuYGybviq444wZzKVyVxjYsP87W8S1nvHUKTkyE71sq8l1tlFp8pexpaIoDRF4nigvGTQS1kOCGVUEGzZzAGFFXdkQT5BC2LhRtrKcIz1zJbR6snVCIyXT1fqK1vfQprH+WEpDVkPUo9Cr+ZTO17CZXNZmTYA2d5G71irf34FT17asWzwi7s8p8t5VJtkb52FTB7LKDqpLxCsrqjXKA1ZMUEqMjeoKFuL540eCnGPJORKZjTTlBSMX1SiMrQvDDEpsL6yWVHVJc8m/cFeopJLCDcxpR/GcsWF1VUipPhEl2+rgUu1WPlxe8FVwstsPn/f33u319g8Xy78JHoFt8ASE4AXYB2/BMRgADD6D7+An+OVZ74v31fs2l3Y2Fj4PQet4P/4CIrlYgQ==</latexit> X1 = AX0 + BU0 + D0 Indirect & certainty-equivalence LQR • collect I/O data (𝑋! , 𝑈! , 𝑋" ) with 𝐷! unknown & PE: rank 𝑈! 𝑋! = 𝑛 + 𝑚 • indirect & certainty- equivalence LQR (optimal in MLE setting) the matrices (B, A) are replaced by their estimates (7). This approach can be formalized as a bi-level program: minimize P ⌫ I, K trace (QP) + trace K>RKP subject to ( ˆ A + ˆ BK)P( ˆ A + ˆ BK)> P + I 0 ⇥ ˆ B ˆ A ⇤ = arg min B,A X1 ⇥ B A ⇤ W0 F . (8) minimize P ⌫ I, K trace (QP) + trace K>RKP subject to ( ˆ A + ˆ BK)P( ˆ A + ˆ BK)> P + I 0 ⇥ ˆ B ˆ A ⇤ = arg min B,A X1 ⇥ B A ⇤  U0 X0 F With n (after a [14]. Wit is to disr mi P ⌫ su which ca In the formulat considera Lemm Consider least squares SysID certainty- equivalent LQR <latexit sha1_base64="Rmq2qLBwKqk3uLDZcUTkjEg1a+A=">AAAD3nicdVLLbhMxFHUzPEp4tbBkYxFVYoGimRCVdFcKQiyLIG1RZhQ8Hk9i1Y/B9qQNlrfsEAs2LOBb+A7+Bk8eiEkTS9YcnXvuvefecVowqk0Y/tlqBNeu37i5fat5+87de/d3dh+caFkqTPpYMqnOUqQJo4L0DTWMnBWKIJ4ycpqev6zipxOiNJXivZkWJOFoJGhOMTKe+hAf0dFItWM33GmF7YNeZ/+gC8N2ODsV6HSj3jMYLZgWWJzj4W7jd5xJXHIiDGZI60EUFiaxSBmKGXHNuNSkQPgcjcjAQ4E40U+zCS30DCZ2Zt7BPR/MYC6Vv8LAGft/skVc6ylPvZIjM9arsYpcFxuUJu8lloqiNETgeaO8ZNBIWG0CZlQRbNjUA4QV9bYhHiOFsPH7qnW5QHrqLdRmslVDIyXTbr2j9TPUaaw/ldKQqyWqVeir/ZTO17CZXNVmswJ17jL3o7lmcw9O/NiyGvEV8X9OkXfemWSvfYZNPcic7bsl4s4Kt0b5ghVjlBJj48rBQjz/NGNBLrDkHInMxprygpFLN4gS68swg4a2FbkVVWVpLvlXboNKKin8wrx2kMwZG7lNJaX6TJSsq8Ol2j/55buGm8FJpx3tt7tvu63Do8Xj3waPwGPwBETgOTgEb8Ax6AMMOPgOfoJfwcfgS/A1+DaXNrYWOQ9B7QQ//gJ11lLI</latexit> } <latexit sha1_base64="Rmq2qLBwKqk3uLDZcUTkjEg1a+A=">AAAD3nicdVLLbhMxFHUzPEp4tbBkYxFVYoGimRCVdFcKQiyLIG1RZhQ8Hk9i1Y/B9qQNlrfsEAs2LOBb+A7+Bk8eiEkTS9YcnXvuvefecVowqk0Y/tlqBNeu37i5fat5+87de/d3dh+caFkqTPpYMqnOUqQJo4L0DTWMnBWKIJ4ycpqev6zipxOiNJXivZkWJOFoJGhOMTKe+hAf0dFItWM33GmF7YNeZ/+gC8N2ODsV6HSj3jMYLZgWWJzj4W7jd5xJXHIiDGZI60EUFiaxSBmKGXHNuNSkQPgcjcjAQ4E40U+zCS30DCZ2Zt7BPR/MYC6Vv8LAGft/skVc6ylPvZIjM9arsYpcFxuUJu8lloqiNETgeaO8ZNBIWG0CZlQRbNjUA4QV9bYhHiOFsPH7qnW5QHrqLdRmslVDIyXTbr2j9TPUaaw/ldKQqyWqVeir/ZTO17CZXNVmswJ17jL3o7lmcw9O/NiyGvEV8X9OkXfemWSvfYZNPcic7bsl4s4Kt0b5ghVjlBJj48rBQjz/NGNBLrDkHInMxprygpFLN4gS68swg4a2FbkVVWVpLvlXboNKKin8wrx2kMwZG7lNJaX6TJSsq8Ol2j/55buGm8FJpx3tt7tvu63Do8Xj3waPwGPwBETgOTgEb8Ax6AMMOPgOfoJfwcfgS/A1+DaXNrYWOQ9B7QQ//gJ11lLI</latexit> }
  7. 7 Subspace relations in state-space data, ordinary least- are identification,

Le W0 :=  U0 . <latexit sha1_base64="d4EGPxJ9UdqpcvByZtIYW/s8FSQ=">AAAD+XicdVJNb9MwGPYaPkb46uDIxVo1hASqkmkCLkjbmBDHIehWqYkix3Faa/4IttOtWLnzN7ghDlw4wI3fwb/BSTtE2s6S40fP+7yfedOCUW2C4M9Gx7t2/cbNzVv+7Tt3793vbj040bJUmAywZFINU6QJo4IMDDWMDAtFEE8ZOU3PXtf20ylRmkrxwcwKEnM0FjSnGBlHJd3tYRLCx6/gARwmAXwKD+GgeY/cN0rpmPX9pNsL+kFz4CoIF6AHFuc42er8jjKJS06EwQxpPQqDwsQWKUMxI5UflZoUCJ+hMRk5KBAn+lk2pYVuYGybviq444wZzKVyVxjYsP87W8S1nvHUKTkyE71sq8l1tlFp8pexpaIoDRF4nigvGTQS1kOCGVUEGzZzAGFFXdkQT5BC2LhRtrKcIz1zJbR6snVCIyXT1fqK1vfQprH+WEpDVkPUo9Cr+ZTO17CZXNZmTYA2d5G71irf34FT17asWzwi7s8p8t5VJtkb52FTB7LKDqpLxCsrqjXKA1ZMUEqMjeoKFuL540eCnGPJORKZjTTlBSMX1SiMrQvDDEpsL6yWVHVJc8m/cFeopJLCDcxpR/GcsWF1VUipPhEl2+rgUu1WPlxe8FVwstsPn/f33u319g8Xy78JHoFt8ASE4AXYB2/BMRgADD6D7+An+OVZ74v31fs2l3Y2Fj4PQet4P/4CIrlYgQ==</latexit> X1 = AX0 + BU0 + D0 Direct approach from subspace relations in data • PE data: rank 𝑈! 𝑋! = 𝑛 + 𝑚 • subspace relations <latexit sha1_base64="CJXysPQeRZdL01Sb4yTOTIx1Mso=">AAAEkHicjVJdb9MwFE1GgVG+uvHIAxbV0BCsStAEe0GsZTBgEgxBt0pNFTmO01pz7GA7XYvlH8YTv4N/g9N2QNJOwlLkk3OPz73XvlFGiVSe98tdu1K7eu36+o36zVu379xtbGyeSJ4LhLuIUy56EZSYEoa7iiiKe5nAMI0oPo3OXhfx0zEWknD2VU0zPEjhkJGEIKgsFTZ+tJ90jsBLEER4SJiOUqgEmRjQAY9AGwSYxX+5quYIBAF4XxHV/89rSdQLvcKuW2wl5eHMkQzpdi/0dw5Cr/gRjw/DRtNrebMFloG/AE1nsY7DjbWfQcxRnmKmEIVS9n0vUwMNhSKIYlMPcokziM7gEPctZDDF8mk8JpmcwYGeXbcBWzYYg4QL+zEFZuy/hzVMpZymkVXaJkayGivIVbF+rpK9gSYsyxVmaJ4oySlQHBRvB2IiMFJ0agFEgtiyARpBAZGyL1zKcg7l1JZQ6kkXCRXnVJrVFa3uoUwj+S3nCi9bFFchl/MJmaxgY17VxjODMjdJbGumXt8CY9s2L1o8wPblBP5iK+P0rT2hIwtio7vmAqVGM7NC2abZCEZY6aCoYCGeb/WA4XPE0xTawQskSTOKJ6bvD7S1oQqGuumbiqooaS75Y3eJigvO7IVZbX8wZ7RvLrPk4jsWvKz2LtR25P3qgC+Dk2ct/3lr9/Nuc7+zGP51577z0Nl2fOeFs++8c46droPcB+4b96P7qbZZ26u9qrXn0jV3ceaeU1q1D78B9beNag==</latexit> A + BK = ⇥ B A ⇤  K I = ⇥ B A ⇤  X0 U0 G = X1 D0 G • data-driven LQR LMIs by substituting <latexit sha1_base64="YFjFbSraPAo/6rCydo9DdtwB6q8=">AAAD83icdVLLbtNAFJ3GPEp4NIUlLEZElYqAyEYVsEFqSwVIbIogbaTYssbjcTLqPMzMOG0YecNvsEMs2LCAH+A7+BvGiYtwkl7J8vG55z59k5xRbXz/z1rLu3T5ytX1a+3rN27e2uhs3j7SslCY9LFkUg0SpAmjgvQNNYwMckUQTxg5Tk5eVv7jCVGaSvHBTHMScTQSNKMYGUfFnXt7D/ffwhcQhgkdse1BHDw+iP3qQz14HXe6fs+fGVwGQQ26oLbDeLP1O0wlLjgRBjOk9TDwcxNZpAzFjJTtsNAkR/gEjcjQQYE40Y/SCc31DEZ2NlEJt5wzhZlU7hEGztj/gy3iWk954pQcmbFe9FXkKt+wMNnzyFKRF4YIPC+UFQwaCav1wJQqgg2bOoCwoq5tiMdIIWzcEhtVTpGeuhYaM9mqoJGS6XJ1R6tnaNJYfyykIcspqlXo5XpKZyvYVC5q01mCJneWudHKdnsLTtzYshrxgLg/p8h715lkr1yETRxIS9svzxEvrShXKPdYPkYJMTasOqjF81c7FOQUS86RSG2oKc8ZOSuHQWRdGmZQbLtBuaCqWppL/qW7QCWVFG5hTjuM5owNyotSSvWJKNlU++dqd/LB4oEvg6MnveBpb+fdTnd3vz7+dXAX3AfbIADPwC54Aw5BH2DwGXwHP8Evr/C+eF+9b3Npa62OuQMa5v34Cw8EWEE=</latexit> A + BK = X1 D0 G <latexit sha1_base64="f2G76S3LGldjbYZSdvY3MoAYfCw=">AAAD2HicdVJNb9NAEN3GQEv4auHIxSKqxAFFNqqgxwIV4lhU0lbEVrRej5NV98PsrtOG1UrcEAcuHODn8Dv4N6yTFOHEXWnlpzdvZt6MNysZ1SaK/mx0ghs3b21u3e7euXvv/oPtnYcnWlaKwIBIJtVZhjUwKmBgqGFwVirAPGNwmp2/qeOnU1CaSvHBzEpIOR4LWlCCjaeOD0fRaLsX9aP5CddBvAQ9tDxHo53O7ySXpOIgDGFY62EclSa1WBlKGLhuUmkoMTnHYxh6KDAH/Syf0lLPYWrntl2464N5WEjlrzDhnP0/2WKu9YxnXsmxmejVWE22xYaVKfZTS0VZGRBk0aioWGhkWO8gzKkCYtjMA0wU9bZDMsEKE+M31ehygfXMW2jMZOuGRkqmXbuj9hmaNNGfKmlgvUS9Cr3eT+mihc3lqjafF2hyl4UfzXW7u+HUjy3rEQ/B/zkFx96ZZG99hs08yJ0duCvEnRWuRfmKlROcgbFJ7WApXny6iYALIjnHIreJprxkcOmGcWp9GWbwyPZit6KqLS0k/8pdo5JKCr8wrx2mC8bG7rqSUn0GJZvq6Ertn3y8+sDXwcnzfvyiv/d+r3fwevn4t9Bj9AQ9RTF6iQ7QO3SEBoigMfqOfqJfwcfgS/A1+LaQdjaWOY9Q4wQ//gI2pE/Z</latexit> D0 <latexit sha1_base64="5VA2OafQ521hXbLB3jdCC+HU1Mc=">AAAD4XicdVLLbtNAFJ3GPEp4tbBkMyKqhASK7KqCbpDaggCJTRGkjRRb0Xh83Yw6D3dmnDaM/AHsEAs2LOBP+A7+hnGSIpykI418dO659557PWnBmbFh+GetFVy7fuPm+q327Tt3793f2HxwZFSpKfSo4kr3U2KAMwk9yyyHfqGBiJTDcXr6qo4fj0EbpuQnOykgEeREspxRYj0V7z89eI9fYtwfRm+HG52wG04PXgbRHHTQ/BwON1u/40zRUoC0lBNjBlFY2MQRbRnlULXj0kBB6Ck5gYGHkggwz7IxK8wUJm7qv8JbPpjhXGl/pcVT9v9kR4QxE5F6pSB2ZBZjNbkqNihtvps4JovSgqSzRnnJsVW4XgbOmAZq+cQDQjXztjEdEU2o9StrdDknZuItNGZydUOrFDfVakerZ2jS1JyVysJyiXoVZrmfNvkKNlOL2mxaoMld5H60qt3ewmM/tqpHfA3+z2n46J0p/sZnuNSDrHK96hKJyslqhXKfFyOSgnVx7WAunn3asYRzqoQgMnOxYaLgcFENosT5MtySoetE1YKqtjST/Ct3hUppJf3CvHaQzBgXVVeVVPozaNVUh5dq/+SjxQe+DI62u9Hz7s6Hnc7ewfzxr6NH6DF6giL0Au2hd+gQ9RBFBfqOfqJfAQ2+BF+DbzNpa22e8xA1TvDjL1ZPUiU=</latexit> A + BK = X1G <latexit sha1_base64="CJXysPQeRZdL01Sb4yTOTIx1Mso=">AAAEkHicjVJdb9MwFE1GgVG+uvHIAxbV0BCsStAEe0GsZTBgEgxBt0pNFTmO01pz7GA7XYvlH8YTv4N/g9N2QNJOwlLkk3OPz73XvlFGiVSe98tdu1K7eu36+o36zVu379xtbGyeSJ4LhLuIUy56EZSYEoa7iiiKe5nAMI0oPo3OXhfx0zEWknD2VU0zPEjhkJGEIKgsFTZ+tJ90jsBLEER4SJiOUqgEmRjQAY9AGwSYxX+5quYIBAF4XxHV/89rSdQLvcKuW2wl5eHMkQzpdi/0dw5Cr/gRjw/DRtNrebMFloG/AE1nsY7DjbWfQcxRnmKmEIVS9n0vUwMNhSKIYlMPcokziM7gEPctZDDF8mk8JpmcwYGeXbcBWzYYg4QL+zEFZuy/hzVMpZymkVXaJkayGivIVbF+rpK9gSYsyxVmaJ4oySlQHBRvB2IiMFJ0agFEgtiyARpBAZGyL1zKcg7l1JZQ6kkXCRXnVJrVFa3uoUwj+S3nCi9bFFchl/MJmaxgY17VxjODMjdJbGumXt8CY9s2L1o8wPblBP5iK+P0rT2hIwtio7vmAqVGM7NC2abZCEZY6aCoYCGeb/WA4XPE0xTawQskSTOKJ6bvD7S1oQqGuumbiqooaS75Y3eJigvO7IVZbX8wZ7RvLrPk4jsWvKz2LtR25P3qgC+Dk2ct/3lr9/Nuc7+zGP51577z0Nl2fOeFs++8c46droPcB+4b96P7qbZZ26u9qrXn0jV3ceaeU1q1D78B9beNag==</latexit> A + BK = ⇥ B A ⇤  K I = ⇥ B A ⇤  X0 U0 G = X1 D0 G à certainty equivalence by neglecting noise : <latexit sha1_base64="juPyDFYYtAJlnftEnFEVJIupTOQ=">AAAEkHicjVJdb9MwFE1GgVG+uvHIAxbV0BCsStAEe0GsZTBgEgxBt0pNFTmO01pz7GA7XYvlH8YTv4N/g9N2QNJOwlLkk3OPz73XvlFGiVSe98tdu1K7eu36+o36zVu379xtbGyeSJ4LhLuIUy56EZSYEoa7iiiKe5nAMI0oPo3OXhfx0zEWknD2VU0zPEjhkJGEIKgsFTZ+tJ90jsBLEER4SJiOUqgEmRjQAY9AGwSYxX+5quYIBAF4XxHV/89rSdQNvcKuV2wl5eHMkQzpdi/0dw5Cr/gRjw/DRtNrebMFloG/AE1nsY7DjbWfQcxRnmKmEIVS9n0vUwMNhSKIYlMPcokziM7gEPctZDDF8mk8JpmcwYGeXbcBWzYYg4QL+zEFZuy/hzVMpZymkVXaJkayGivIVbF+rpK9gSYsyxVmaJ4oySlQHBRvB2IiMFJ0agFEgtiyARpBAZGyL1zKcg7l1JZQ6kkXCRXnVJrVFa3uoUwj+S3nCi9bFFchl/MJmaxgY17VxjODMjdJbGumXt8CY9s2L1o8wPblBP5iK+P0rT2hIwtio7vmAqVGM7NC2abZCEZY6aCoYCGeb/WA4XPE0xTawQskSTOKJ6bvD7S1oQqGuumbiqooaS75Y3eJigvO7IVZbX8wZ7RvLrPk4jsWvKz2LtR25P3qgC+Dk2ct/3lr9/Nuc7+zGP51577z0Nl2fOeFs++8c46droPcB+4b96P7qbZZ26u9qrXn0jV3ceaeU1q1D78B9aKNag==</latexit> A + BK = ⇥ B A ⇤  K I = ⇥ B A ⇤  U0 X0 G = X1 D0 G <latexit sha1_base64="8DmHIsi3G7jJCQUDj8ardx0UY3g=">AAAEmHicjVJ/axMxGL7OqvP8sU7/03+CZShYyp0MFWQwf+AcQ5jTboOmlFzubRuWS84kt7WGfDbxY/htzLXd8NoODBz35H2fvM/zJm+Sc6ZNFP2prd2o37x1e/1OePfe/Qcbjc2Hx1oWikKHSi7VaUI0cCagY5jhcJorIFnC4SQ5+1DmT85BaSbFdzPJoZeRoWADRonxoX7jNzYwNlYRceZwAkMmbJIRo9jYhZ1+hDBGp/0oxCDSq8SOeJHhtwgfseHIEKXkRbkbSEU4R7h1gFu4hTCMvXvt0R6aaSDdNm20pHJQauxXFcKd8H/N7PUbzagdTRdaBvEcNIP5Ouxvrv3CqaRFBsJQTrTuxlFuepYowygHF+JCQ07oGRlC10NBMtCt9Jzlegp7dnrtDm35ZIp83/4TBk2j/x62JNN6kiWe6c2O9GKuDK7KdQszeNOzTOSFAUFnQoOCIyNR+YYoZQqo4RMPCFXM20Z0RBShxr90ReWC6Im3UOnJloJGSq7dakere6iGqf5RSAPLJcqr0Mt6Sg9WRFO5yE2nBaqx8cC35sJwC537tmXZ4kfwL6fgm3cm+Sd/wiYepM523CXKnBVuBfMdz0ckAWNx6WBOnv1CLOCCyiwjfsCwZlnOYey6cc/6MtyQvm3GboFVWppRrspdw5JKCn9hntvtzSI2dteVlOonKFllR5dsP/Lx4oAvg+OX7fhVe/vrdnP3/Xz414MnwdPgeRAHr4Pd4HNwGHQCWntW+1I7rp3UH9d363v1/Rl1rTY/8yiorPrRXzuBkqk=</latexit> rank  U0 X0 = n + m ) 8 K 9 G s.t.  K I =  U0 X0 G
  8. 8 Equivalence: direct + xxx ó indirect • direct approach

    the matrices (B, A) are replaced by their estimates (7). This approach can be formalized as a bi-level program: minimize P ⌫ I, K trace (QP) + trace K>RKP subject to ( ˆ A + ˆ BK)P( ˆ A + ˆ BK)> P + I 0 ⇥ ˆ B ˆ A ⇤ = arg min B,A X1 ⇥ B A ⇤ W0 F . (8) minimize P ⌫ I, K trace (QP) + trace K>RKP subject to ( ˆ A + ˆ BK)P( ˆ A + ˆ BK)> P + I 0 ⇥ ˆ B ˆ A ⇤ = arg min B,A X1 ⇥ B A ⇤  U0 X0 F With noise-free da (after a convexificati [14]. With noisy data is to disregard D0 wh minimize P ⌫ I, K, G tr subject to X  which can be posed a In the noiseless ca formulations (4), (8), considerable nullspac Lemma 2.1: (Null Consider the data-dri • indirect approach . This parametrization LQR problem since no atrices is involved. fficiently implemented the optimal controller wn, a natural approach formulation e K>RKP P + I 0 (12) I = W0G ⇣ I W† 0 W0 ⌘ G = 0 . minimize P ⌫ I, K, G trace (QP) + trace K>RKP subject to X1GPG>X> 1 P + I 0  K I =  U0 X0 G I  U0 X0 †  U0 X0 ! G = 0 (15) → optimizer has nullspace → orthogonality constraint equivalent constraints: . This parametrization LQR problem since no atrices is involved. fficiently implemented the optimal controller wn, a natural approach formulation e K>RKP P + I 0 (12) I = W0G ⇣ I W† 0 W0 ⌘ G = 0 . minimize P ⌫ I, K, G trace (QP) + trace K>RKP subject to X1GPG>X> 1 P + I 0  K I =  U0 X0 G I  U0 X0 †  U0 X0 ! G = 0 (15) <latexit sha1_base64="q5C9QC3gYqqfCJJtqKkGc4guYfo=">AAAEOnicdVJNb9MwGPYWPkb46uDIxaIa4oCqZJqAC9I2EOI4BN0qNSVynLetNccOttO1WPlnSJz5CVy5IQ5cOMANp+mAtN0rRXn0vM/76TfJOdMmCL5sbHqXLl+5unXNv37j5q3bre07x1oWikKXSi5VLyEaOBPQNcxw6OUKSJZwOElOn1f+kwkozaR4a2Y5DDIyEmzIKDGOils0SmDEhE0yYhSbljgaE4MP8YMaHOAIRPrP+wz34nApxO/GAY4i5wn8hvpdlJLRCFTcagedYG54FYQL0EYLO4q3Nz9FqaRFBsJQTrTuh0FuBpYowyiH0o8KDTmhp2QEfQcFyUA/Sics13M4sPPFlHjHOVM8lMp9wuA5+3+wJZnWsyxxStfzWC/7KnKdr1+Y4dOBZSIvDAhaFxoWHBuJqy3jlCmghs8cIFQx1zamY6IINe4tGlXOiJ65Fhoz2aqgkZLrcn1H62do0lS/L6SB1RTVKvRqPaWHa9hULmvTeYImNx260Urf38ETN7asRnwB7uUUvHGdSf7SRdjEgbS03fIcZaUV5RrlAc/HJAFjo6qDhbj++ZGAMyqzjLg7izTLcg7Tsh8OrEvDDYltOyyXVFVLteRvugtUUknhFua0/UHN2LC8KKVUH0DJpjo4V7uTD5cPfBUc73bCx52913vt/cPF8W+he+g+eohC9ATto1foCHURRZ/RD/QL/fY+el+9b973Wrq5sYi5ixrm/fwDViZ2lg==</latexit> ⇥ ˆ B ˆ A ⇤ = X1  U0 X0 † <latexit sha1_base64="Q8DDSvcSbI3SGs8Yo04g/TvI2DQ=">AAAECXicdVLLjtMwFPU0PIbw6sCSTUQ1EgtUJaPRwHJ4CLEcBJ2p1FSR49y0Vh072E6nxfIX8AP8BjvEAhYs4Bf4G5y2MyJtx5KVo3OP7z335qYlo0qH4d+dlnft+o2bu7f823fu3rvf3ntwqkQlCfSIYEL2U6yAUQ49TTWDfikBFymDs3Tyqo6fTUEqKvgHPS9hWOARpzklWDsqaR/EGmbaTEDaOIUR5SYtsJZ0Zv1eEgZxHPST0I+BZ5eBpN0Ju+HiBJsgWoEOWp2TZK/1M84EqQrgmjCs1CAKSz00WGpKGFg/rhSUmEzwCAYOclyAeppNaakWcGgWfdpg3wWzIBfSXa6DBfv/Y4MLpeZF6pTO61itx2pyW2xQ6fz50FBeVho4WRbKKxZoEdRDCzIqgWg2dwATSZ3tgIyxxES70TaqnGM1dxYaPZm6oBaCKbvd0fYemjRRHyuhYTNFPQq1WU+qfAubiXVttkjQ5Ga5a836/n4wdW2LusXX4P6chPfOmWBv3AuTOpBZ07MXqLCG2y3KF6wc4xS0iWsHK/Hy48cczokoCuz2K1a0KBnM7CAaGpeGaZyYTmTXVLWlpeQy3RUqIQV3A3PawXDJmMhelVLITyBFUx1eqN3KR+sLvglOD7rRUffw3WHn+OVq+XfRI/QYPUEReoaO0Vt0gnqIoC/oB/qN/nifva/eN+/7UtraWb15iBrH+/UPpqNkEQ==</latexit> ker  U0 X0 <latexit sha1_base64="J8edX4+WoQVPc2OQ/FU7r+bAf/o=">AAAEL3icdVLLjtMwFPVMeAzh1YElG4tqJBaoStBoYIM0PMRDbAZBZyo1oXKcm9Yaxw6202mx8lH8AHwGYoNYsGEB34DTdkak7ViKcnTu8b3nXt+k4EybIPi+selduHjp8tYV/+q16zdutrZvHWpZKgpdKrlUvYRo4ExA1zDDoVcoIHnC4Sg5flbHj8agNJPivZkWEOdkKFjGKDGOGrR6L/FjHCUwZMImOTGKTSq/OwhwFOHeIPAjEOlZ4EOUkuEQ1LL+Ta1+3dQOWu2gE8wOXgXhArTR4hwMtje/RqmkZQ7CUE607odBYWJLlGGUQ+VHpYaC0GMyhL6DguSg76djVugZjO1sGhXeccEUZ1K5Txg8Y/+/bEmu9TRPnNJ5HenlWE2ui/VLkz2KLRNFaUDQeaGs5NhIXI8Wp0wBNXzqAKGKOduYjogi1LgHaFQ5IXrqLDR6snVBIyXX1XpH63to0lR/LKWB1RT1KPRqPaWzNWwql7XpLEGTm2Sutcr3d/DYtS3rFp+DezkF75wzyV+4GzZxIK1stzpFeWVFtUb5hBcjkoCxUe1gIZ7//EjACZV5Ttx+RZrlBYdJ1Q9j69JwQwa2HVZLqtrSXHKW7hyVVFK4gTltP54zNqzOSynVJ1CyqQ5O1W7lw+UFXwWHDzrhXmf37W57/+li+bfQHXQX3UMheoj20St0gLqIoi/oF/qD/nqfvW/eD+/nXLq5sbhzGzWO9/sftwByjA==</latexit> G =  U0 X0 †  K I <latexit sha1_base64="+TslwsyJMhJZbLnfi1XcJe6sOV8=">AAAEb3icdVLbTtswGHah21h2ALaLXUyarHVIoEGVMDTKHTtoGtpNp61QqS6R47jBwrGD7QCdlffY2+w59hh7gWlOG9BSiqUon77/+89/lHGmje//biwsNu/cvbd033vw8NHj5ZXVJ4da5orQHpFcqn6ENeVM0J5hhtN+pihOI06PotMPpf3onCrNpPhuxhkdpjgRbMQINo4KV36iiCUJX++HAYpowoSNUmwUuyy8XuhDhGA/9D1ERXxtOEYxThKqvFmHL6X8oC6ehFcbaNPros0qF4ql0ZXhGBmZObct2IWv4QFErnpCz6AfrrT89l6n4wdvoN/2J68E28Hezi4MKqYFqtcNVxd+ucAkT6kwhGOtB4GfmaHFyjDCaeGhXNMMk1Oc0IGDAqdUb8bnLNMTOLSTaRZwzRljOJLKfcLACfu/s8Wp1uM0ckrX5ImetZXkPNsgN6PO0DKR5YYKMk00yjk0EpargTFzvRs+dgATxVzZkJxghYlxC6xlucB67Eqo9WTLhEZKrov5Fc3voU4TfZZLQ2+GKEehb+ZTejSHLfdbZ+NJgDp3OXKtFZ63Bs9d27Js8SN1m1P0m6tM8k/Ow0YOxIXtFVcoLawo5ijf8ewER9RYVFZQiac/Dwl6QWSaYneYSLM04/SyGARD68Jwg0PbCooZVVnSVHId7haVVFK4gTntYDhlbFDcFlKqH1TJutq/UruTv7preDs43G4Hb9s7X3da+++r418Cz8FLsA4CsAv2wWfQBT1AwN/Gq8ZWo734p/ms+aIJp9KFRuXzFNRec+MfA3SDqQ==</latexit> ✓ X1  U0 X0 †  K I ◆ P ✓ . . . ◆> P + I 0
  9. 9 Regularized, certainty-equivalent, & direct LQR • orthogonality constraint lifted

sha1_base64="Rmq2qLBwKqk3uLDZcUTkjEg1a+A=">AAAD3nicdVLLbhMxFHUzPEp4tbBkYxFVYoGimRCVdFcKQiyLIG1RZhQ8Hk9i1Y/B9qQNlrfsEAs2LOBb+A7+Bk8eiEkTS9YcnXvuvefecVowqk0Y/tlqBNeu37i5fat5+87de/d3dh+caFkqTPpYMqnOUqQJo4L0DTWMnBWKIJ4ycpqev6zipxOiNJXivZkWJOFoJGhOMTKe+hAf0dFItWM33GmF7YNeZ/+gC8N2ODsV6HSj3jMYLZgWWJzj4W7jd5xJXHIiDGZI60EUFiaxSBmKGXHNuNSkQPgcjcjAQ4E40U+zCS30DCZ2Zt7BPR/MYC6Vv8LAGft/skVc6ylPvZIjM9arsYpcFxuUJu8lloqiNETgeaO8ZNBIWG0CZlQRbNjUA4QV9bYhHiOFsPH7qnW5QHrqLdRmslVDIyXTbr2j9TPUaaw/ldKQqyWqVeir/ZTO17CZXNVmswJ17jL3o7lmcw9O/NiyGvEV8X9OkXfemWSvfYZNPcic7bsl4s4Kt0b5ghVjlBJj48rBQjz/NGNBLrDkHInMxprygpFLN4gS68swg4a2FbkVVWVpLvlXboNKKin8wrx2kMwZG7lNJaX6TJSsq8Ol2j/55buGm8FJpx3tt7tvu63Do8Xj3waPwGPwBETgOTgEb8Ax6AMMOPgOfoJfwcfgS/A1+DaXNrYWOQ9B7QQ//gJ11lLI</latexit> } <latexit sha1_base64="drQELzFlSBVfyso0W3TgQWYkj5w=">AAAD3XicdVLLbhMxFHU7PEp4tbBkYxFVYoGimRCVdFceoiyDIG2kzBB5PJ7Gqh+D7UkbLC/ZIRZsWMC/8B38DZ48EJMmlqw5Ovfce8+947RgVJsw/LO1HVy7fuPmzq3G7Tt3793f3XtwomWpMOljyaQapEgTRgXpG2oYGRSKIJ4ycpqev6ripxOiNJXig5kWJOHoTNCcYmQ8NTjuHX+MjSxGu82wddhtHxx2YNgKZ6cC7U7UfQajBdMEi9Mb7W3/jjOJS06EwQxpPYzCwiQWKUMxI64Rl5oUCJ+jMzL0UCBO9NNsQgs9g4mdeXdw3wczmEvlrzBwxv6fbBHXespTr+TIjPVqrCLXxYalybuJpaIoDRF43igvGTQSVouAGVUEGzb1AGFFvW2Ix0ghbPy6al0ukJ56C7WZbNXQSMm0W+9o/Qx1GutPpTTkaolqFfpqP6XzNWwmV7XZrECdu8z9aK7R2IcTP7asRnxN/J9T5L13Jtkbn2FTDzJn+26JuLPCrVG+YMUYpcTYuHKwEM8/jViQCyw5RyKzsaa8YOTSDaPE+jLMoJFtRm5FVVmaS/6V26CSSgq/MK8dJnPGRm5TSak+EyXr6nCp9k9++a7hZnDSbkUHrc67TvPo5eLx74BH4DF4AiLwHByBt6AH+gADBr6Dn+BXMAq+BF+Db3Pp9tYi5yGoneDHX/YIUls=</latexit> GPG> <latexit sha1_base64="fpIwZTV9eE+QAqIHtBzlMSzWstM=">AAAD3nicdVLLbtNAFJ3GPIp5tbBkMyKqxAJFdohCuisPAcsgSFsUW2E8HjejzsPMjNOGqbfsEAs2LOBb+A7+hnEeCKfJSCMfnXvuvedeT5Izqk0Q/NlqeFeuXru+fcO/eev2nbs7u/cOtSwUJgMsmVTHCdKEUUEGhhpGjnNFEE8YOUpOX1TxowlRmkrx3kxzEnN0ImhGMTKO+hBdRH0KX0cXo51m0Nrvtbv7HRi0gtmpQLsT9p7AcME0weL0R7uN31EqccGJMJghrYdhkJvYImUoZqT0o0KTHOFTdEKGDgrEiX6cTmiuZzC2M/Ml3HPBFGZSuSsMnLH/J1vEtZ7yxCk5MmO9GqvIdbFhYbJebKnIC0MEnjfKCgaNhNUmYEoVwYZNHUBYUWcb4jFSCBu3r1qXM6SnzkJtJls1NFIyXa53tH6GOo31p0IacrlEtQp9uZ/S2Ro2lavadFagzp1nbrTS9/fgxI0tqxFfEvfnFHnnnEn2ymXYxIG0tINyiXhpRblG+YzlY5QQY6PKwUI8//iRIGdYco5EaiNNec7IeTkMY+vKMINGthmWK6rK0lzyr9wGlVRSuIU57TCeMzYsN5WU6jNRsq4Olmr35JfvGm4Gh+1W2G1133aaB88Xj38bPAAPwSMQgqfgALwBfTAAGHDwHfwEv7yP3hfvq/dtLm1sLXLug9rxfvwFAW9SqA==</latexit> k⇧Gk <latexit sha1_base64="Cu3OsCVWUlrnrlcec2iJKPEwJ5Q=">AAAD3HicdVJNb9NAEN3GQEv4auHIxSKqxAFFNqqgx/IhxLEI0lSKTbRej5tV9sPsrtOG1d64IQ5cOMCP4Xfwb1g7KcJJutLKT2/ezLwZb1Yyqk0U/dnqBNeu39jeudm9dfvO3Xu7e/dPtKwUgQGRTKrTDGtgVMDAUMPgtFSAecZgmE1f1fHhDJSmUnww8xJSjs8ELSjBxlPDhHlpjse7vagfNSdcB/ES9NDyHI/3Or+TXJKKgzCEYa1HcVSa1GJlKGHgukmlocRkis9g5KHAHPSTfEZL3cDUNtZduO+DeVhI5a8wYcP+n2wx13rOM6/k2Ez0aqwmN8VGlSkOU0tFWRkQZNGoqFhoZFjvIcypAmLY3ANMFPW2QzLBChPjt9Xqco713FtozWTrhkZKpt1mR5tnaNNEf6qkgfUS9Sr0ej+liw1sLle1eVOgzV0UfjTX7e6HMz+2rEd8Df7PKXjvnUn2xmfYzIPc2YG7RNxZ4TYoX7BygjMwNqkdLMWLTzcRcE4k51jkNtGUlwwu3ChOrS/DDB7bXuxWVLWlheRfuStUUknhF+a1o3TB2NhdVVKqz6BkWx1dqv2Tj1cf+Do4edqPn/UP3h30jl4uH/8OeogeoccoRs/REXqLjtEAETRF39FP9Cv4GHwJvgbfFtLO1jLnAWqd4MdfcfVR6w==</latexit> <latexit sha1_base64="YFjFbSraPAo/6rCydo9DdtwB6q8=">AAAD83icdVLLbtNAFJ3GPEp4NIUlLEZElYqAyEYVsEFqSwVIbIogbaTYssbjcTLqPMzMOG0YecNvsEMs2LCAH+A7+BvGiYtwkl7J8vG55z59k5xRbXz/z1rLu3T5ytX1a+3rN27e2uhs3j7SslCY9LFkUg0SpAmjgvQNNYwMckUQTxg5Tk5eVv7jCVGaSvHBTHMScTQSNKMYGUfFnXt7D/ffwhcQhgkdse1BHDw+iP3qQz14HXe6fs+fGVwGQQ26oLbDeLP1O0wlLjgRBjOk9TDwcxNZpAzFjJTtsNAkR/gEjcjQQYE40Y/SCc31DEZ2NlEJt5wzhZlU7hEGztj/gy3iWk954pQcmbFe9FXkKt+wMNnzyFKRF4YIPC+UFQwaCav1wJQqgg2bOoCwoq5tiMdIIWzcEhtVTpGeuhYaM9mqoJGS6XJ1R6tnaNJYfyykIcspqlXo5XpKZyvYVC5q01mCJneWudHKdnsLTtzYshrxgLg/p8h715lkr1yETRxIS9svzxEvrShXKPdYPkYJMTasOqjF81c7FOQUS86RSG2oKc8ZOSuHQWRdGmZQbLtBuaCqWppL/qW7QCWVFG5hTjuM5owNyotSSvWJKNlU++dqd/LB4oEvg6MnveBpb+fdTnd3vz7+dXAX3AfbIADPwC54Aw5BH2DwGXwHP8Evr/C+eF+9b3Npa62OuQMa5v34Cw8EWEE=</latexit> A + BK = X1 D0 G <latexit sha1_base64="XhQpxuxbvgChgPwHAxVhyHdG3gw=">AAAD2HicdVLLbhMxFHUzPEp4tbBkYxFVYoGiGdQmYVegQiyLStqKZBR5PJ7Eqh+D7UkbLEvsEAs2LOBz+A7+Bs8kQUyaWLLm6Nxz7z33jpOcUW3C8M9WI7hx89bt7TvNu/fuP3i4s/voVMtCYdLHkkl1niBNGBWkb6hh5DxXBPGEkbPk4k0ZP5sSpakUH8wsJzFHY0EzipHx1MnRKBzttML2y06n2+vCsB1WpwT7B1HYg9GCaYHFOR7tNn4PU4kLToTBDGk9iMLcxBYpQzEjrjksNMkRvkBjMvBQIE7083RKc13B2Fa2HdzzwRRmUvkrDKzY/5Mt4lrPeOKVHJmJXo2V5LrYoDBZL7ZU5IUhAs8bZQWDRsJyBzClimDDZh4grKi3DfEEKYSN31StyyXSM2+hNpMtGxopmXbrHa2foU5j/amQhlwvUa5CX++ndLaGTeWqNq0K1LmrzI/mms09OPVjy3LEI+L/nCIn3plkb32GTTxIne27JeLOCrdG+YrlE5QQY4elg4V4/mkOBbnEknMkUjvUlOeMXLlBFFtfhhk0sq3IrahKS3PJv3IbVFJJ4RfmtYN4ztjIbSop1WeiZF0dLtX+yS/fNdwMTl+0o067836/dfh68fi3wRPwFDwDEeiCQ/AOHIM+wGAMvoOf4FfwMfgSfA2+zaWNrUXOY1A7wY+/BixQGA==</latexit> D0 <latexit sha1_base64="XhQpxuxbvgChgPwHAxVhyHdG3gw=">AAAD2HicdVLLbhMxFHUzPEp4tbBkYxFVYoGiGdQmYVegQiyLStqKZBR5PJ7Eqh+D7UkbLEvsEAs2LOBz+A7+Bs8kQUyaWLLm6Nxz7z33jpOcUW3C8M9WI7hx89bt7TvNu/fuP3i4s/voVMtCYdLHkkl1niBNGBWkb6hh5DxXBPGEkbPk4k0ZP5sSpakUH8wsJzFHY0EzipHx1MnRKBzttML2y06n2+vCsB1WpwT7B1HYg9GCaYHFOR7tNn4PU4kLToTBDGk9iMLcxBYpQzEjrjksNMkRvkBjMvBQIE7083RKc13B2Fa2HdzzwRRmUvkrDKzY/5Mt4lrPeOKVHJmJXo2V5LrYoDBZL7ZU5IUhAs8bZQWDRsJyBzClimDDZh4grKi3DfEEKYSN31StyyXSM2+hNpMtGxopmXbrHa2foU5j/amQhlwvUa5CX++ndLaGTeWqNq0K1LmrzI/mms09OPVjy3LEI+L/nCIn3plkb32GTTxIne27JeLOCrdG+YrlE5QQY4elg4V4/mkOBbnEknMkUjvUlOeMXLlBFFtfhhk0sq3IrahKS3PJv3IbVFJJ4RfmtYN4ztjIbSop1WeiZF0dLtX+yS/fNdwMTl+0o067836/dfh68fi3wRPwFDwDEeiCQ/AOHIM+wGAMvoOf4FfwMfgSfA2+zaWNrUXOY1A7wY+/BixQGA==</latexit> D0 <latexit sha1_base64="XhQpxuxbvgChgPwHAxVhyHdG3gw=">AAAD2HicdVLLbhMxFHUzPEp4tbBkYxFVYoGiGdQmYVegQiyLStqKZBR5PJ7Eqh+D7UkbLEvsEAs2LOBz+A7+Bs8kQUyaWLLm6Nxz7z33jpOcUW3C8M9WI7hx89bt7TvNu/fuP3i4s/voVMtCYdLHkkl1niBNGBWkb6hh5DxXBPGEkbPk4k0ZP5sSpakUH8wsJzFHY0EzipHx1MnRKBzttML2y06n2+vCsB1WpwT7B1HYg9GCaYHFOR7tNn4PU4kLToTBDGk9iMLcxBYpQzEjrjksNMkRvkBjMvBQIE7083RKc13B2Fa2HdzzwRRmUvkrDKzY/5Mt4lrPeOKVHJmJXo2V5LrYoDBZL7ZU5IUhAs8bZQWDRsJyBzClimDDZh4grKi3DfEEKYSN31StyyXSM2+hNpMtGxopmXbrHa2foU5j/amQhlwvUa5CX++ndLaGTeWqNq0K1LmrzI/mms09OPVjy3LEI+L/nCIn3plkb32GTTxIne27JeLOCrdG+YrlE5QQY4elg4V4/mkOBbnEknMkUjvUlOeMXLlBFFtfhhk0sq3IrahKS3PJv3IbVFJJ4RfmtYN4ztjIbSop1WeiZF0dLtX+yS/fNdwMTl+0o067836/dfh68fi3wRPwFDwDEeiCQ/AOHIM+wGAMvoOf4FfwMfgSfA2+zaWNrUXOY1A7wY+/BixQGA==</latexit> D0
  10. 10 Performance & robustness certificates <latexit sha1_base64="8pTO1Scg6oJfh5XLkp/SKnthQ6o=">AAAE23icjVLLbhMxFJ20AUp4tbBkYxG1CqKNZlB5LAtUiAUSLZC2UjyKHM9NYuqxp7YnTWqZDTvEgg0L+Bm+g7/Bk6RAHpWwNJqj43OfPu2MM23C8Fdpabl86fKVlauVa9dv3Ly1unb7QMtcUWhQyaU6ahMNnAloGGY4HGUKSNrmcNg+flHcH/ZBaSbFezPMIE5JV7AOo8R4qrVWWt7AHUWoxRYbGBiroJtzotgZJCghhmwlivVBoNf7b1EGqiNVSgQFhx3aQudBXSVzkWwZlZvejMrZ/xEhjJv1R5DGlQ2ET05ykmAm8CbCKTE9Sjh6gzCHjqmhSbeadVPSsikZuNpuK7zv/lJMuFrzqBV+bLTC2GHFuj1zH28+KNKNGqFSaFN3mCbSIKx6srVaDevh6KB5EE1ANZicvdba0k+cSJqnIAzlROtmFGYmtkQZRjm4Cs41ZIQeky40PRQkBb2Z9FmmRzC2o5dzaN1fJshvwn/CoBH7b7AlqdbDtO2VxSb07F1BLrpr5qbzNLZMZLkBQceFOjlHRqLCBihhCqjhQw8IVcy3jWiP+NUab5apKqdED30LUzPZoqCRkmu3uKPFM0zTVJ/k0sB8imIVer6e0p0FrH/DGTYZJZjmBoVrXKWyjvp+bFmMuAv+5RS8851J/tJH2LYHibMNd45SZ4VboHzGsx5pg7Eje07E418FCzilMvXOTgpLphmHgWtGsfVpuPH+rEZuRlW0NJb8SXeBSiop/MK8thmPGRu5i1JKdQZKTqvDc7W3fDRr8Hlw8LAePa5v729Xd55PzL8S3A3uBbUgCp4EO8GrYC9oBLT0ofS19L30oxyXP5U/l7+MpUulScydYOqUv/0GAJaxMw==</latexit> {regularized data-driven LQR

    performance} {ground-truth performance} {ground-truth performance} 2 O ✓ max(D0) min([X0 U0] ◆ + const. · ⇢ realized cost from regularized design with large <latexit sha1_base64="5RCCL3hJgnjbvO0xhNvjzP8Y6X4=">AAAD3HicdVLLbhMxFHUzPEp4tbBkYxFVYoGimRCVdFceQiyLIE2lzBB5PJ7Gih+D7UkbLO/YIRZsWMDH8B38DZ48EJMmlqw5Ovfce8+947RgVJsw/LPTCK5dv3Fz91bz9p279+7v7T841bJUmPSxZFKdpUgTRgXpG2oYOSsUQTxlZJBOXlXxwZQoTaX4YGYFSTg6FzSnGBlPDWLmpRka7bXC9lGvc3jUhWE7nJ8KdLpR7xmMlkwLLM/JaL/xO84kLjkRBjOk9TAKC5NYpAzFjLhmXGpSIDxB52TooUCc6KfZlBZ6DhM7t+7ggQ9mMJfKX2HgnP0/2SKu9YynXsmRGev1WEVuig1Lk/cSS0VRGiLwolFeMmgkrPYAM6oINmzmAcKKetsQj5FC2Pht1bpcID3zFmoz2aqhkZJpt9nR5hnqNNafSmnI1RLVKvTVfkrnG9hMrmuzeYE6d5n70VyzeQCnfmxZjfia+D+nyHvvTLI3PsOmHmTO9t0KcWeF26B8wYoxSomxceVgKV58mrEgF1hyjkRmY015wcilG0aJ9WWYQSPbityaqrK0kPwrt0UllRR+YV47TBaMjdy2klJ9JkrW1eFK7Z/86l3D7eC0044O29133dbxy+Xj3wWPwGPwBETgOTgGb8EJ6AMMJuA7+Al+BR+DL8HX4NtC2thZ5jwEtRP8+AsvzlIj</latexit> if exact system matrices A & B were known • SNR (signal-to-noise-ratio) <latexit sha1_base64="sSbjDww+/3GQL+G0vtupxOJLdMQ=">AAAECXicdVLLbtNAFJ3GPEp4pbBkYxFVSiUU2VUFLAtUiGURpI1kW9Z4PE5GnYeZGacJo+EH+AF+gx1iAQsW8Av8DeMkBZykVxr56Nxzn75ZSYnSQfB7q+VduXrt+vaN9s1bt+/c7ezcO1GikggPkKBCDjOoMCUcDzTRFA9LiSHLKD7Nzl7U/tMJlooI/lbPSpwwOOKkIAhqR6WdfT8uJEQmVmTEYGoY4bYXDdPgwyANkj37zwGntneUBns27XSDfjA3fx2ES9AFSztOd1rf41ygimGuEYVKRWFQ6sRAqQmi2LbjSuESojM4wpGDHDKsHuUTUqo5TMx8TuvvOmfuF0K6x7U/Z/8PNpApNWOZUzKox2rVV5ObfFGli6eJIbysNOZoUaioqK+FXy/Nz4nESNOZAxBJ4tr20Ri6vWm32kaVc6hmroXGTKYuqIWgym7uaPMMTRqpd5XQeD1FvQq1Xk+qYgObi1VtPk/Q5Kb1Sdh2e9efuLFFPeIRdn9O4jeuM0FfugiTOZBbM7AXiFnD7QblM1qOYYa1iesOluLFpx1zfI4EY5Dn9aWxkuKpjcLEuDRUu7PrhnZFVbe0kPxNd4lKSMHdwpw2ShaMCe1lKYV8j6VoqoMLtTv5cPXA18HJfj983D94fdA9fL48/m3wADwEPRCCJ+AQvALHYAAQ+AS+gZ/gl/fR++x98b4upK2tZcx90DDvxx8GlWOd</latexit> min([X0 U0]) max(D0) • relative performance metric Certificate for sufficiently large SNR: the optimal control problem is feasible (robustly stabilizing) with relative performance ~ 𝒪 ⁄ (1 𝑆𝑁𝑅).
  11. 11 Numerical case study • case study [Dean et al.

    ‘19]: discrete-time system with noise variance 𝜎2 = 0.01 & variable regularization coefficient 𝜆 Regarding the novel norm-based regularizer presented Section III-C: as of today, there is no robust stability ertificate, though the authors are confident that the methods ading up to Theorems 4.1 and 4.2 can be used as well. V. NUMERICAL CASE STUDY We exemplify our theoretical findings via a simulation case udy. We consider the system proposed in [7, Section 6]: A = 2 4 1.01 0.01 0 0.01 1.01 0.01 0 0.01 1.01 3 5 , B = I . hese dynamics correspond to a discrete-time marginally nstable Laplacian system. As weight matrices, we select = I and R = 10 3I. Taking the input weight R small latively to the state weight Q favours stabilizing solutions 16, Section 5]. In particular, this choice makes it possible find stabilizing controllers even from a single experiment. median of Ek through all the trials. because it is more robust to outlie of Ek that are due to the a particul Figure 1 confirms that regularizat certainty-equivalence approach, is achieves good performance (S = 1 when the SNR is not too small like B. Certainty-equivalence approach regularization, and low-rank appro Now we compare certainty equiva the robust one (18). Specifically, co minimize P ⌫I,K,G trace (QP) + trace + · k⇧Gk + ⇢ · subject to X1GPG>X> 1 P  K b-optimality gap. Regarding the assumptions, Theorem 4.2 uires kD0 k to be sufficiently small, instead of a SNR ficiently large. This more restrictive condition is due to the sence of ⇢. As shown in [16], (25) indeed holds provided t the SNR is sufficiently large (just like Theorem 4.1) and 0 k2/⇢ is sufficiently small. As discussed in Section III- the trace regularization favours robustness, and kD0 k2/⇢ antitatively captures this fact: as kD0 k increases (data are re noisy) we need larger values of ⇢ (larger regulariza- n), and this is precisely what Theorem 4.2 entails. This uirement is not present in Theorem 4.1 because certainty uivalence directly gives a regularizer with large enough ight (Theorem 3.2). The robust formulation nonetheless s some advantages. As we previously discussed, for both and (18) stability follows if the solution satisfies (17). r certainty-equivalence LQR we have G = W† [ K ], so • take-home message: regularization is needed for robustness & performance % of stabilizing controllers (100 trials) median relative performance error breaks without regularizer → works… but lame: learning is offline regularization coefficient 𝜆
  12. 12 Online & adaptive solutions • shortcoming of separating offline

    learning & online control → cannot improve policy online & cheaply / rapidly adapt to changes • (elitist) desired adaptive solution: direct, online (non-episodic/non-batch) algorithms, with closed-loop data, & recursive algorithmic implementation • “best” way to improve policy with new data → go down the gradient ! PII: S0005–1098(98)00089–2 Automatica, Vol. 34, No. 10, pp. 1161—1167, 1998 1998 IFAC. Published by Elsevier Science Ltd All rights reserved. Printed in Great Britain 0005-1098/98 $—see front matter Adaptive Control: Towards a Complexity-Based General Theory* G. ZAMES- Key Words—Hcontrol; adaptive control; learning control; performance analysis. Abstract—Two recent developments are pointing the way to- wards an input—output theory of H!l adaptive feedback: The solution of problems involving: (1) feedback performance exact optimization under large plant uncertainty on the one hand (the two-disc problem of H); and (2) optimally fast identi- fication in H on the other. Taken together, these are yielding adaptive algorithms for slowly varying data in H!l. At a conceptual level, these results motivate a general input—output theory linking identification, adaptation, and control learning. In such a theory, the definition of adaptation is based on system performance under uncertainty, and is independent of internal structure, presence or absence of variable parameters, or even feedback. 1998 IFAC. Published by Elsevier Science Ltd. All rights reserved. 1. INTRODUCTION certain difficulties. Controllers with identical external behavior can have an endless variety of parametrizations; variable parameters in one parametrization may be replaced by a fixed para- meter nonlinearity in another. In most of the recent control literature there is no clear separation be- tween the concepts of adaptation and nonlinear feedback, or between research on adaptive control and nonlinear stability. This lack of clarity extends to fields other than control; e.g. in debates as to whether neural nets do or do not have a learning capacity; or in the classical 1960s Chomsky vs Skin- ner argument as to whether children’s language skills are learned from the environment tabula rasa “adaptive = improve over best control with a priori info” * disclaimer: a large part of the adaptive control community focuses on stability & not optimality
  13. 13 Ingredient 1: policy gradient methods • LQR viewed as

    smooth program (many formulations) • 𝐽 𝐾 is not convex … mizes the H2 -norm of T (K) (henceforth, optimal) is unique and can be computed by solving a discrete-time Riccati equa- tion [1]. Alternatively, following [35], this optimal controller can be determined by solving the following program: minimize P ⌫ I, K trace (QP) + trace K>RKP subject to (A + BK)P(A + BK)> P + I 0 , (4) 1Given a stable p ⇥ m transfer function T ( ) in the indeterminate , the squared H2-norm of T ( ) is defined as [34, Section 4.4]: kT k 2 2 := 1 2⇡ Z 2⇡ 0 trace(T (e j✓)0T (e j✓)) d✓ condition (6) is sa and the pair (B, reducing to an ex mild also in case Based on (U0, an estimate ( ˆ B, ˆ A as the unique solu ⇥ ˆ B ˆ A ⇤ = arg B where k · kF de right inverse. Bas equivalence contr after eliminating (unique) P, denote this as 𝐽 𝐾 <latexit sha1_base64="Rmq2qLBwKqk3uLDZcUTkjEg1a+A=">AAAD3nicdVLLbhMxFHUzPEp4tbBkYxFVYoGimRCVdFcKQiyLIG1RZhQ8Hk9i1Y/B9qQNlrfsEAs2LOBb+A7+Bk8eiEkTS9YcnXvuvefecVowqk0Y/tlqBNeu37i5fat5+87de/d3dh+caFkqTPpYMqnOUqQJo4L0DTWMnBWKIJ4ycpqev6zipxOiNJXivZkWJOFoJGhOMTKe+hAf0dFItWM33GmF7YNeZ/+gC8N2ODsV6HSj3jMYLZgWWJzj4W7jd5xJXHIiDGZI60EUFiaxSBmKGXHNuNSkQPgcjcjAQ4E40U+zCS30DCZ2Zt7BPR/MYC6Vv8LAGft/skVc6ylPvZIjM9arsYpcFxuUJu8lloqiNETgeaO8ZNBIWG0CZlQRbNjUA4QV9bYhHiOFsPH7qnW5QHrqLdRmslVDIyXTbr2j9TPUaaw/ldKQqyWqVeir/ZTO17CZXNVmswJ17jL3o7lmcw9O/NiyGvEV8X9OkXfemWSvfYZNPcic7bsl4s4Kt0b5ghVjlBJj48rBQjz/NGNBLrDkHInMxprygpFLN4gS68swg4a2FbkVVWVpLvlXboNKKin8wrx2kMwZG7lNJaX6TJSsq8Ol2j/55buGm8FJpx3tt7tvu63Do8Xj3waPwGPwBETgOTgEb8Ax6AMMOPgOfoJfwcfgS/A1+DaXNrYWOQ9B7QQ//gJ11lLI</latexit> } Fact: policy gradient descent 𝐾# = 𝐾 − 𝜂 ∇𝐽 𝐾 initialized from a stabilizing policy converges linearly to 𝐾∗. Annual Review of Control, Robotics, and Autonomous Systems Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies Bin Hu,1 Kaiqing Zhang,2,3 Na Li,4 Mehran Mesbahi,5 Maryam Fazel,6 and Tamer Ba¸ sar1 1Coordinated Science Laboratory and Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA; email: [email protected], [email protected] 2Laboratory for Information and Decision Systems and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 3Current affiliation: Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, Maryland, USA; email: [email protected] 4School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA; email: [email protected] 5Department of Aeronautics and Astronautics, University of Washington, Seattle, Washington, USA; email: [email protected] 6Department of Electrical and Computer Engineering, University of Washington, Seattle, Washington, USA; email: [email protected] Annu. Rev. Control Robot. Auton. Syst. 2023. 6:123–58 The Annual Review of Control, Robotics, and Autonomous Systems is online at control.annualreviews.org https://doi.org/10.1146/annurev-control-042920- 020021 Copyright © 2023 by the author(s). This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See credit lines of images or other third-party material in this article for license information. Keywords policy optimization, reinforcement learning, feedback control synthesis Abstract Gradient-based methods have been widely used for system design and opti- mization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of re- inforcement learning. We take an interdisciplinary perspective in our expo- sition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexity 123 but on the set of stabilizing gains K , it’s • coercive with compact sublevel sets, • smooth with bounded Hessian, & • degree-2 gradient dominated 𝐽 𝐾 − 𝐽∗ ≤ 𝑐𝑜𝑛𝑠𝑡. A ∇𝐽 𝐾 %
  14. 14 Model-free policy gradient methods • policy gradient: 𝐾# =

    𝐾 − 𝜂 ∇𝐽 𝐾 converges linearly to 𝐾∗ • model-based setting: explicit Anderson-Moore formula for ∇𝐽 𝐾 based on closed-loop controllability + observability Gramians • model-free 0th order methods constructing two-point gradient estimate from numerous & very long trajectories → extremely sample inefficient • IMO: policy gradient is a potentially great candidate for direct adaptive control but sadly useless in practice: sample-inefficient, episodic, … relative performance gap 𝜖 = 1 𝜖 = 0.1 𝜖 = 0.01 # trajectories (100 samples) 1414 43850 142865 ~ 𝟏𝟎𝟕 samples
  15. 15 Ingredient 2: sample covariance parameterization prior parameterization • PE

    condition: full row rank 𝑈! 𝑋! • 𝐴 + 𝐵𝐾 = 𝐵 𝐴 𝐾 𝐼 = 𝐵 𝐴 𝑈! 𝑋! 𝐺 = 𝑋" 𝐺 • robustness: 𝐺 = 𝑈! 𝑋! # + ↔ regularization • dimension of all matrices grows with 𝑡 covariance parameterization • sample covariance Λ = " $ 𝑈! 𝑋! 𝑈! 𝑋! # ≻ 0 • 𝐴 + 𝐵𝐾 = 𝐵 𝐴 𝐾 𝐼 = 𝐵 𝐴 Λ𝑉 = " $ 𝑋" 𝑈! 𝑋! # 𝑉 • robustness for free without regularization • dimension of all matrices is constant + cheap rank-1 updates for online data <latexit sha1_base64="Rubmbi2jBjstsCW57D8UViccO0c=">AAAD9HicdVLLbtNAFJ3WPIp5pbBESCOiIiRQZCNU2CC1BSGWRZA2UmxZ4/E4GXUeZmacNoxmxW+wQyzYsIAP4Dv4G8ZJinCSXsmao3PPffrmFaPaRNGfjc3g0uUrV7euhddv3Lx1u7N950jLWmHSx5JJNciRJowK0jfUMDKoFEE8Z+Q4P3nV+I8nRGkqxQczrUjK0UjQkmJkPJV17g+yGD58CffhIIvgY3gA+/6FSU5HrBdmnW7Ui2YGV0G8AF2wsMNse/N3UkhccyIMZkjrYRxVJrVIGYoZcWFSa1IhfIJGZOihQJzoJ8WEVnoGUzsbycEd7yxgKZX/hIEz9v9gi7jWU557JUdmrJd9DbnON6xN+SK1VFS1IQLPC5U1g0bCZj+woIpgw6YeIKyobxviMVIIG7/FVpVTpKe+hdZMtilopGTare9o/QxtGuuPtTRkNUWzCr1aT+lyDVvIZW0xS9Dmzko/mgvDHTjxY8tmxNfE/zlF3vvOJHvjI2zuQeFs350j7qxwa5T7rBqjnBibNB0sxPMnTAQ5xZJzJAqbaMorRs7cME6tT8MMymw3dkuqpqW55F+6C1RSSeEX5rXDdM7Y2F2UUqpPRMm2OjpX+5OPlw98FRw97cW7vd13z7p7B4vj3wL3wAPwCMTgOdgDb8Eh6AMMPoPv4Cf4FUyCL8HX4NtcurmxiLkLWhb8+Atjvlcz</latexit> X1 = AX0 + BU0 <latexit sha1_base64="B4wuEh0J7R8ieBaefLQf6RTWC/Y=">AAAEGHicdVJNb9MwGPYaPkb4WAdHLhbVUCdBlSA0uCBtgBDHIeg2qYkqx3Faa44dbKdrsfxH+AP8DLghDlw4gPg3OGk3kbazFPnJ8z7vp9+kYFTpIPi70fKuXL12ffOGf/PW7Ttb7e27R0qUEpM+FkzIkwQpwignfU01IyeFJChPGDlOTl9V9uMJkYoK/kHPChLnaMRpRjHSjhq2D/rDAL6AUUJGlJskR1rSqfXLbrALH8KyG1ZXhFOhVf2vH4e7fkR4eqEdtjtBL6gPXAXhAnTA4hwOt1tfo1TgMidcY4aUGoRBoWODpKaYEetHpSIFwqdoRAYOcpQT9Sid0ELVMDZ12xbuOGMKMyHdxzWs2f+dDcqVmuWJU7pax2rZVpHrbINSZ89jQ3lRasLxPFFWMqgFrGYIUyoJ1mzmAMKSurIhHiOJsHaTbmQ5Q2rmSmj0ZKqEWgim7PqK1vfQpLH6WApNVkNUo1Cr+aTK1rDVszbZtA7Q5KaZa836/g6cuLZF1eJr4l5OkveuMsHeOA+TOJBa07fnKLeG2zXKA1aMUUK0iaoKFuL55UecnGGR58jtV6RoXjAytYMwNi4M02hoOqFdUlUlzSUX4S5RCSm4G5jTDuI5Y0J7WUghPxEpmurgXO1WPlxe8FVw9KQX7vX23j3t7L9cLP8muA8egC4IwTOwD96CQ9AHGHwBP8Fv8Mf77H3zvns/5tLWxsLnHmgc79c/ky9mHA==</latexit> U0 = ⇥ u(0) u(1) · · · u(t 1) ⇤ <latexit sha1_base64="qN9xCJlsB8BdcO/5R9eepoFov/U=">AAAEFnicdVJNb9MwGPYWPkb4WAdHLhbVUCehKpnQ4IIYH0Ich6BbpSaqHMdprTl2sJ2uxfL/4A/wN+CEOHDhAAf+DU7aTaTtLEV+8rzP++k3KRhVOgj+bmx6V65eu751w7956/ad7dbO3WMlSolJDwsmZD9BijDKSU9TzUi/kATlCSMnyemryn4yIVJRwT/oWUHiHI04zShG2lHD1vP+MITPYJSQEeUmyZGWdGr9aSfcgw/htLNfXRFOhVb1v97zI8LTC+Ww1Q66QX3gKggXoA0W52i4s/ktSgUuc8I1ZkipQRgUOjZIaooZsX5UKlIgfIpGZOAgRzlRj9IJLVQNY1M3beGuM6YwE9J9XMOa/d/ZoFypWZ44pat1rJZtFbnONih19jQ2lBelJhzPE2Ulg1rAaoIwpZJgzWYOICypKxviMZIIazfnRpYzpGauhEZPpkqohWDKrq9ofQ9NGquPpdBkNUQ1CrWaT6psDVs9apNN6wBNbpq51qzv78KJa1tULb4m7uUkee8qE+yN8zCJA6k1PXuOcmu4XaN8wYoxSog2UVXBQjy//IiTMyzyHLn9ihTNC0amdhDGxoVhGg1NO7RLqqqkueQi3CUqIQV3A3PaQTxnTGgvCynkJyJFUx2cq93Kh8sLvgqO97vhQffg3eP24cvF8m+B++AB6IAQPAGH4C04Aj2AwRfwE/wGf7zP3lfvu/djLt3cWPjcA43j/foHE+hluQ==</latexit> X1 = ⇥ x(1) x(2) · · · x(t) ⇤ <latexit sha1_base64="fuRdm8viNijCMCoLSf46fxQtuX4=">AAAEGHicdVJNb9MwGPYaPkb4WAdHLhbVUCfBlCA0uCBtgBDHIehWqYkqx3Faa44dbKdrsfxH+AP8DLghDlw4gPg3OGk3kbazFPnJ8z7vp9+kYFTpIPi70fKuXL12ffOGf/PW7Ttb7e27x0qUEpMeFkzIfoIUYZSTnqaakX4hCcoTRk6S01eV/WRCpKKCf9CzgsQ5GnGaUYy0o4btw/4wgC9glJAR5SbJkZZ0av1pN9iFD+G0G1ZXhFOhVf2vH4e7fkR4eqEdtjvBXlAfuArCBeiAxTkabre+RqnAZU64xgwpNQiDQscGSU0xI9aPSkUKhE/RiAwc5Cgn6lE6oYWqYWzqti3cccYUZkK6j2tYs/87G5QrNcsTp3S1jtWyrSLX2Qalzp7HhvKi1ITjeaKsZFALWM0QplQSrNnMAYQldWVDPEYSYe0m3chyhtTMldDoyVQJtRBM2fUVre+hSWP1sRSarIaoRqFW80mVrWGrZ22yaR2gyU0z15r1/R04cW2LqsXXxL2cJO9dZYK9cR4mcSC1pmfPUW4Nt2uUh6wYo4RoE1UVLMTzy484OcMiz5Hbr0jRvGBkagdhbFwYptHQdEK7pKpKmksuwl2iElJwNzCnHcRzxoT2spBCfiJSNNXBudqtfLi84Kvg+MleuL+3/+5p5+DlYvk3wX3wAHRBCJ6BA/AWHIEewOAL+Al+gz/eZ++b9937MZe2NhY+90DjeL/+Ab5nZig=</latexit> X0 = ⇥ x(0) x(1) · · · x(t 1) ⇤
  16. 16 Covariance parameterization of the LQR • state / input

    sample covariance Λ = " & 𝑈! 𝑋! 𝑈! 𝑋! ' & 𝑋" = " & 𝑋" 𝑈! 𝑋! ' • closed-loop matrix 𝐴 + 𝐵𝐾 = 𝑋" 𝑉 with 𝐾 −−−− 𝐼 = Λ 𝑉 = 𝑈! −−−− 𝑋! 𝑉 • LQR covariance parameterization after eliminating 𝐾 with variable 𝑉, Lyapunov eqn (explicitly solvable), smooth cost 𝐽(𝑉) (after removing 𝑃), & linear parameterization constraint min !,#≻% trace 𝑄𝑃 + trace 𝑉&𝑈% & 𝑅𝑈% 𝑉𝑃 s. t. 𝑃 = 𝐼 + 𝑋' 𝑉 𝑃𝑉&𝑋' & , 𝐼 = 𝑋% 𝑉 details are not important
  17. 17 Projected policy gradient with sample covariances • data-enabled policy

    optimization (DeePO) Π(! projects on parameterization constraint 𝐼 = 𝑋! 𝑉 & gradient ∇𝐽 𝑉 is computed from two Lyapunov equations with sample covariances • optimization landscape: smooth, degree-1 proj. grad dominance 𝐽 𝑉 − 𝐽∗ ≤ 𝑐𝑜𝑛𝑠𝑡. A Π(! ∇𝐽 𝑉 • warm-up: offline data & no disturbance 𝑉# = 𝑉 − 𝜂 Π(! (∇𝐽 𝑉 ) Sublinear convergence for feasible initialization 𝐽 𝑉) − 𝐽∗ ≤ 𝒪(1/𝑘) . 𝐽 𝑉" − 𝐽∗ 𝐽∗ note: empirically faster linear rate case: 4th order system with 8 data samples
  18. 18 Online, adaptive, & closed-loop DeePO where 𝑋!,&#" = 𝑥

    0 , 𝑥 1 , … 𝑥 𝑡 , 𝑥(𝑡 + 1) & similar for other matrices • cheap & recursive implementation: rank-1 update of (inverse) sample covariances, cheap computation, & no memory needed to store old data 𝑥! = 𝐴𝑥 + 𝐵𝑢 + 𝑑 𝑥 𝑢 𝑢 = 𝐾"!# 𝑥 ① update sample covariances: Λ"!# & ‾ 𝑋$,"!# ② update decision variable: 𝑉"!# = Λ"!# &# 𝐾" 𝐼' ③ gradient descent: 𝑉"!# ( = 𝑉"!# − 𝜂Π ‾ *!,#$% (∇𝐽"!# 𝑉"!# ) ④ update control gain: 𝐾"!# = F 𝑈$,"!#𝑉"!# ( DeePO policy update Input: (𝑋$,"!#, 𝑈$,"!#, 𝑋#,"!#), 𝐾" Output: 𝐾"!# 𝑑 𝐾"!#
  19. 19 Underlying assumptions for theoretic certificates • initially stabilizing controller:

    the LQR problem parameterized by offline data 𝑋!,&! , 𝑈!,&! , 𝑋",&! is feasible with stabilizing gain 𝐾&! . • persistency of excitation due to process noise or probing: 𝜎 ℋH#" 𝑈!,& ≥ 𝛾 A 𝑡 with Hankel matrix ℋH#" 𝑈!,& • bounded noise: 𝑑(𝑡) ≤ 𝛿 ∀ 𝑡 → signal-to-noise ratio 𝑆𝑁𝑅 ≔ ⁄ 𝛾 𝛿 • BIBO: there are V 𝑢, ̅ 𝑥 such that 𝑢(𝑡) ≤ V 𝑢 & 𝑥 𝑡 ≤ ̅ 𝑥 (∃ common Lyapunov function ?)
  20. 20 Bounded regret of DeePO in adaptive setting • average

    regret performance metric RegretJ ≔ " J ∑ &K&! &!#JL" 𝐽 𝐾& − 𝐽∗ • comments on the qualitatively expected result: • analysis is independent of the noise statistics & consistent Regret-→/ → 0 • favorable sample complexity: sublinear decrease term matches best rate 𝒪(1/ 𝑇) of first-order methods in online convex optimization • empirically observe smaller bias term: 𝒪( ⁄ 1 𝑆𝑁𝑅0) & not ⁄ 𝒪(1 𝑆𝑁𝑅) Sublinear regret: Under the assumptions, there are 𝜈" , 𝜈% , 𝜈M , 𝜈N > 0 such that for 𝜂 ∈ (0, 𝜈" ] & 𝑆𝑁𝑅 ≥ 𝜈% , it holds that 𝐾& is stabilizing & RegretJ ≤ 𝜈M 𝑇 + 𝜈N 𝑆𝑁𝑅 .
  21. 21 Comparison case studies • same case study [Dean et

    al. ’19] 𝐽 𝐾1 − 𝐽∗ 𝐽∗ • case 1: offline LQR vs direct adaptive DeePO vs indirect adaptive: rls + dlqr → adaptive outperforms offline → direct/indirect rates matching but direct is much(!) cheaper • case 2: adaptive DeePO vs 0&O order methods relative performance gap 𝜖 = 1 𝜖 = 0.1 𝜖 = 0.01 # long trajectories (100 samples) for 0$% order LQR 1414 43850 142865 DeePO (# I/O samples) 10 24 48 → significantly less data
  22. 22 Power systems / electronics case study • wind turbine

    becomes unstable in weak grids with nonlinear oscillations • converter, turbine, & grid are a black box for the commissioning engineer • construct state from time shifts (5ms sampling) of 𝑦 𝑡 , 𝑢(𝑡) & use DeePO synchronous generator & full-scale converter
  23. 23 Power systems / electronics case study 0 2 4

    6 8 10 12 time [s] (a) 0.84 0.86 0.88 0.9 0.92 0.94 0.96 active power (p.u.) probe & collect data oscillation observed activate DeePO without DeePO with DeePO (100 iterations) with DeePO (1 iteration)
  24. 24 … same in the adaptive setting with excitation 0

    2 4 6 8 10 12 time [s] (a) 0.84 0.86 0.88 0.9 0.92 0.94 0.96 active power (p.u.) without DeePO with adaptive DeePO probe & collect data oscillation observed activate DeePO
  25. 25 Conclusions • Summary • model-based pipeline with model-free block:

    data-driven LQR parametrization → works well when regularized (note: further flexible regularizations available) • model-free pipeline with model-based block: policy gradient & sample covariance → DeePO is adaptive, online, with closed-loop data, & recursive implementation • academic case studies & can be made useful in power systems/electronics • Future work • technicalities: weaken assumptions & improve rates • control: based on output feedback & for other objectives • further system classes: stochastic, time-varying, & nonlinear • open questions: online vs episodic? “best” batch size? triggered?
