Slide 1

Slide 1 text

CᴀRE: Finding Root Causes of Configuration Issues in Highly-Configurable Robots Md Abir Hossen Bradley Schmerl Javier Cámara Jason M. O’Kane Ellen C. Czaplinski Pooyan Jamshidi David Garlan Katherine A. Dzurilla Sonam Kharade

Slide 2

Slide 2 text

Motivation Causal Inference CᴀRE Results 2

Slide 3

Slide 3 text

Motivation Causal Inference CᴀRE Results Motivation 2

Slide 4

Slide 4 text

Clear costmap Rotate recovery 6 1 Behavior Configuration Space in Robotic Systems Recovery Behaviors Global Planner Global Costmap Local Planner Local Costmap 14 # Configuration Options ROS Navigation Stack BaseLocalPlanner DWA planner Eband planner TEB planner MPC planner 33 30 32 80 77 Algorithms in local planner 14 Only one can be selected at a time Only one can be selected at a time BaseGlobalPlanner Navfn Carrot planner Algorithms in global planner 33 30 32 Both must be selected Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space. 3

Slide 5

Slide 5 text

Clear costmap Rotate recovery 6 1 Behavior Configuration Space in Robotic Systems Recovery Behaviors Global Planner Global Costmap Local Planner Local Costmap 14 # Configuration Options ROS Navigation Stack BaseLocalPlanner DWA planner Eband planner TEB planner MPC planner 33 30 32 80 77 Algorithms in local planner 14 Only one can be selected at a time Only one can be selected at a time BaseGlobalPlanner Navfn Carrot planner Algorithms in global planner 33 30 32 Both must be selected Configurations Possible 2382 Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space. 3

Slide 6

Slide 6 text

Motivating Scenarios Turtlebot 3 Transform tolerance Inflation radius Husky UGV Cost scaling factor Husky UGV Inflation radius Ocean World Lander Autonomy Testbed Force threshold Arm joint angles 4

Slide 7

Slide 7 text

Challenges • Trial-and-error requires non-trivial human efforts due to the large configuration space. • Even after finding the optimal fix, the new fix is not guaranteed to function in different environments. • Performance influence models suffer from several shortcomings: • Non-transferable across different robotic platforms and environments • Incorrect explanations • Data collection is expensive from physical hardware 5

Slide 8

Slide 8 text

Incorrect Reasoning About The Robot’s Behavior Increasing Planner failed increases Mission success P( ) 6

Slide 9

Slide 9 text

Incorrect Reasoning About The Robot’s Behavior Increasing Planner failed increases Mission success P( ) More Planner failed should reduce Mission success not increase it P( ) This is counter-intuitive 6

Slide 10

Slide 10 text

Incorrect Reasoning About The Robot’s Behavior Increasing Planner failed increases Mission success P( ) More Planner failed should reduce Mission success not increase it P( ) This is counter-intuitive Purely statistical models built on this data will be unreliable. 6

Slide 11

Slide 11 text

Performance Influence Models might be Unreliable Segregating data on Obstacle Cost indicates that within each group increase of Planer Failed result in a decrease in Mission success . P( ) 7

Slide 12

Slide 12 text

Performance Influence Models might be Unreliable Segregating data on Obstacle Cost indicates that within each group increase of Planer Failed result in a decrease in Mission success . P( ) 7

Slide 13

Slide 13 text

Motivation Causal Inference CᴀRE Results 8

Slide 14

Slide 14 text

Why Causal Model? To build reliable models that produce correct explanations Obstacle cost Planner failed Mission success Obstacle cost affects Mission success via Planner failed. Causal Models recover the correct interactions. 9

Slide 15

Slide 15 text

Motivation Causal Inference CᴀRE Results 10

Slide 16

Slide 16 text

CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption

Slide 17

Slide 17 text

CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption

Slide 18

Slide 18 text

CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption

Slide 19

Slide 19 text

CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption

Slide 20

Slide 20 text

Research Questions 12 • Research Question 1 (Accuracy). To what extent are the root causes determined by CaRE the true root causes of the observed functional faults? • Research Question 2 (Transferability). To what extent can CaRE accurately detect misconfigurations when deployed in a different platform?

Slide 21

Slide 21 text

Motivation Causal Inference CᴀRE Results 13

Slide 22

Slide 22 text

Results 14 Husky Targets Real Environment Targets Husky Gazebo Env. Monitor Husky /move_base subscribed topics set_param_ values send_goal Battery percentage RNS Traveled distance - conservative_reset - rotate_recovery - aggressive_reset Recovery tracker Mission time Recovery_behaviors API MoveBaseActionFeedback /gazebo_ros_battery /targets_reached rosbag API Total time to reach goal Observational data Record rosbag Evaluate rosbag Observational Data Collection Experimental Setup Energy Occdist scale, xy goal tolerance Occdist scale, Goal distance bias Transform tolerance, Combination method Goal distance bias, Transform tolerance Combination method, yaw goal tolerance Update frequency, Cost scaling factor Mission Success 1 3 4 1 3 4                               Energy Mission Success Option Rank Path dist bias Occdist scale Recovery executed RNS Mission success Traveled distance Energy Tranform tolerance Publish frequency Planner Costmap 2d Navigation Stack Sub-systems Performance Metrics Performance Objectives Causal Interaction Goal dist bias A partial causal model discovered in our experiments using Husky in simulation Applying CaRE Configuration options which rank higher have the strongest influence on the performance objectives. Takeaway             Energy            Energy        Energy               Mission success             Mission success             Mission success Energy                                 Mission success Evaluation

Slide 23

Slide 23 text

Results 15            Evaluation C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Targets Husky Gazebo Environment Real Environment Applying CaRE Learning the Causal Model from Source Platform and Environment Target Platform and Environment Learned Causal Model Husky Turtlebot 3 Reusing the learned causal model from source Experimental Setup CaRE transfer reasonably well when reusing the causal model learned from source platform and environment and achieves higher accuracy than the baseline. Takeaway

Slide 24

Slide 24 text

ROS Navigation Stack Clear costmap Rotate recovery 6 1 Behavior Recovery Behaviors Global Planner Global Costmap Local Planner Local Costmap 14 # Configuration Options BaseLocalPlanner DWA planner Eband planner TEB planner MPC planner 33 30 32 80 77 Algorithms in local planner 14 Only one can be selected at a time Only one can be selected at a time BaseGlobalPlanner Navfn Carrot planner Algorithms in global planner 33 30 32 Both must be selected Configurations Possible 2382 Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space. X Configuration Space in Robotic Systems Increasing Planner failed increases Mission success P( ) More Planner failed should reduce Mission success not increase it P( ) This is counter-intuitive Purely statistical models built on this data will be unreliable. X Incorrect Reasoning About The Robot’s Behavior Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption CaRE Husky Targets Real Environment Targets Husky Gazebo Env. Monitor Husky /move_base subscribed topics set_param_ values send_goal Battery percentage RNS Traveled distance - conservative_reset - rotate_recovery - aggressive_reset Recovery tracker Mission time Recovery_behaviors API MoveBaseActionFeedback /gazebo_ros_battery /targets_reached rosbag API Total time to reach goal Observational data Record rosbag Evaluate rosbag Observational Data Collection Experimental Setup Energy Occdist scale, xy goal tolerance Occdist scale, Goal distance bias Transform tolerance, Combination method Goal distance bias, Transform tolerance Combination method, yaw goal tolerance Update frequency, Cost scaling factor Mission Success 1 3 4 1 3 4                               Energy Mission Success Option Rank Path dist bias Occdist scale Recovery executed RNS Mission success Traveled distance Energy Tranform tolerance Publish frequency Planner Costmap 2d Navigation Stack Sub-systems Performance Metrics Performance Objectives Causal Interaction Goal dist bias A partial causal model discovered in our experiments using Husky in simulation Applying CaRE Configuration options which rank higher have the strongest influence on the performance objectives. Takeaway             Energy            Energy        Energy               Mission success             Mission success             Mission success Energy                                 Mission success Evaluation Results • Robotic systems are highly configurable, hundreds or even thousands of possible software and hardware configuration options interacting non-trivially. • Incorrect configuration (misconfiguration) can cause buggy behavior resulting in both functional and non-functional faults. • Performance influence models, such as regression models suffer from several shortcomings including, • Producing incorrect explanations • Non-transferable • Training data collection is expensive from physical hardware Challenges The Team • A novel framework for finding root causes of the configuration bugs in robotic systems. • We evaluated CaRE conducting a comprehensive empirical study in a controlled environment across multiple robotic platforms, including Husky and Turtlebot 3 both in simulation and physical robots. • We demonstrated the transferability of the causal models by learning the causal model in the Husky simulator, and reusing it in the Turtlebot 3 physical platform Key Conributions https://github.com/ softsys4ai/care