Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CaRE: Finding Root Causes of Configuration Issues in Highly-Configurable Robots

Md Abir hossen
September 06, 2023
330

CaRE: Finding Root Causes of Configuration Issues in Highly-Configurable Robots

Robotic systems have several subsystems that possess a huge combinatorial configuration space and hundreds or even thousands of possible software and hardware configuration options interacting non-trivially. The configurable parameters can be tailored to target specific objectives, but when incorrectly configured, can cause functional faults. Finding the root cause of such faults is challenging due to the exponentially large configuration space and the dependencies between the robot's configuration settings and performance. This paper proposes CaRE, a method for diagnosing the root cause of functional faults through the lens of causality, which abstracts the causal relationships between various configuration options and the robot's performance objectives. We demonstrate CaRE's efficacy by finding the root cause of the observed functional faults via CaRE and validating the diagnosed root cause, conducting experiments in both physical robots (Husky and Turtlebot 3) and in simulation (Gazebo). Furthermore, we demonstrate that the causal models learned from robots in simulation (simulating Husky in Gazebo) are transferable to physical robots across different platforms (Turtlebot 3).

Md Abir hossen

September 06, 2023
Tweet

Transcript

  1. CᴀRE: Finding Root Causes of Configuration Issues in Highly-Configurable Robots

    Md Abir Hossen Bradley Schmerl Javier Cámara Jason M. O’Kane Ellen C. Czaplinski Pooyan Jamshidi David Garlan Katherine A. Dzurilla Sonam Kharade
  2. Clear costmap Rotate recovery 6 1 Behavior Configuration Space in

    Robotic Systems Recovery Behaviors Global Planner Global Costmap Local Planner Local Costmap 14 # Configuration Options ROS Navigation Stack BaseLocalPlanner DWA planner Eband planner TEB planner MPC planner 33 30 32 80 77 Algorithms in local planner 14 Only one can be selected at a time Only one can be selected at a time BaseGlobalPlanner Navfn Carrot planner Algorithms in global planner 33 30 32 Both must be selected Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space. 3
  3. Clear costmap Rotate recovery 6 1 Behavior Configuration Space in

    Robotic Systems Recovery Behaviors Global Planner Global Costmap Local Planner Local Costmap 14 # Configuration Options ROS Navigation Stack BaseLocalPlanner DWA planner Eband planner TEB planner MPC planner 33 30 32 80 77 Algorithms in local planner 14 Only one can be selected at a time Only one can be selected at a time BaseGlobalPlanner Navfn Carrot planner Algorithms in global planner 33 30 32 Both must be selected Configurations Possible 2382 Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space. 3
  4. Motivating Scenarios Turtlebot 3 Transform tolerance Inflation radius Husky UGV

    Cost scaling factor Husky UGV Inflation radius Ocean World Lander Autonomy Testbed Force threshold Arm joint angles 4
  5. Challenges • Trial-and-error requires non-trivial human efforts due to the

    large configuration space. • Even after finding the optimal fix, the new fix is not guaranteed to function in different environments. • Performance influence models suffer from several shortcomings: • Non-transferable across different robotic platforms and environments • Incorrect explanations • Data collection is expensive from physical hardware 5
  6. Incorrect Reasoning About The Robot’s Behavior Increasing Planner failed increases

    Mission success P( ) More Planner failed should reduce Mission success not increase it P( ) This is counter-intuitive 6
  7. Incorrect Reasoning About The Robot’s Behavior Increasing Planner failed increases

    Mission success P( ) More Planner failed should reduce Mission success not increase it P( ) This is counter-intuitive Purely statistical models built on this data will be unreliable. 6
  8. Performance Influence Models might be Unreliable Segregating data on Obstacle

    Cost indicates that within each group increase of Planer Failed result in a decrease in Mission success . P( ) 7
  9. Performance Influence Models might be Unreliable Segregating data on Obstacle

    Cost indicates that within each group increase of Planer Failed result in a decrease in Mission success . P( ) 7
  10. Why Causal Model? To build reliable models that produce correct

    explanations Obstacle cost Planner failed Mission success Obstacle cost affects Mission success via Planner failed. Causal Models recover the correct interactions. 9
  11. CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from

    observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption
  12. CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from

    observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption
  13. CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from

    observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption
  14. CaRE: End-to-end Pipeline 11 Stages 1. Learn causal model from

    observational data 2. Identify the causal paths 3. Average causal effect estimation Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption
  15. Research Questions 12 • Research Question 1 (Accuracy). To what

    extent are the root causes determined by CaRE the true root causes of the observed functional faults? • Research Question 2 (Transferability). To what extent can CaRE accurately detect misconfigurations when deployed in a different platform?
  16. Results 14 Husky Targets Real Environment Targets Husky Gazebo Env.

    Monitor Husky /move_base subscribed topics set_param_ values send_goal Battery percentage RNS Traveled distance - conservative_reset - rotate_recovery - aggressive_reset Recovery tracker Mission time Recovery_behaviors API MoveBaseActionFeedback /gazebo_ros_battery /targets_reached rosbag API Total time to reach goal Observational data Record rosbag Evaluate rosbag Observational Data Collection Experimental Setup Energy Occdist scale, xy goal tolerance Occdist scale, Goal distance bias Transform tolerance, Combination method Goal distance bias, Transform tolerance Combination method, yaw goal tolerance Update frequency, Cost scaling factor Mission Success 1 3 4 1 3 4                               Energy Mission Success Option Rank Path dist bias Occdist scale Recovery executed RNS Mission success Traveled distance Energy Tranform tolerance Publish frequency Planner Costmap 2d Navigation Stack Sub-systems Performance Metrics Performance Objectives Causal Interaction Goal dist bias A partial causal model discovered in our experiments using Husky in simulation Applying CaRE Configuration options which rank higher have the strongest influence on the performance objectives. Takeaway             Energy            Energy        Energy               Mission success             Mission success             Mission success Energy                                 Mission success Evaluation
  17. Results 15        

       Evaluation C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Targets Husky Gazebo Environment Real Environment Applying CaRE Learning the Causal Model from Source Platform and Environment Target Platform and Environment Learned Causal Model Husky Turtlebot 3 Reusing the learned causal model from source Experimental Setup CaRE transfer reasonably well when reusing the causal model learned from source platform and environment and achieves higher accuracy than the baseline. Takeaway
  18. ROS Navigation Stack Clear costmap Rotate recovery 6 1 Behavior

    Recovery Behaviors Global Planner Global Costmap Local Planner Local Costmap 14 # Configuration Options BaseLocalPlanner DWA planner Eband planner TEB planner MPC planner 33 30 32 80 77 Algorithms in local planner 14 Only one can be selected at a time Only one can be selected at a time BaseGlobalPlanner Navfn Carrot planner Algorithms in global planner 33 30 32 Both must be selected Configurations Possible 2382 Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space. X Configuration Space in Robotic Systems Increasing Planner failed increases Mission success P( ) More Planner failed should reduce Mission success not increase it P( ) This is counter-intuitive Purely statistical models built on this data will be unreliable. X Incorrect Reasoning About The Robot’s Behavior Config. that has Causal Model Observational Data C1 E2 P1 C4 E1 P2 C3 E2 P1 Path's Rank Root causes Find highest perf-affecting config. options Average causal effect of each option Debugging Edge orientation rules Constraints 1 2 3 4 5 Learn causal model C1 C2 C3 C4 C5 E1 E2 E3 P1 P2 Examples: C1: goal_distance_bias E1: position_accuracy P1 : energy_consumption CaRE Husky Targets Real Environment Targets Husky Gazebo Env. Monitor Husky /move_base subscribed topics set_param_ values send_goal Battery percentage RNS Traveled distance - conservative_reset - rotate_recovery - aggressive_reset Recovery tracker Mission time Recovery_behaviors API MoveBaseActionFeedback /gazebo_ros_battery /targets_reached rosbag API Total time to reach goal Observational data Record rosbag Evaluate rosbag Observational Data Collection Experimental Setup Energy Occdist scale, xy goal tolerance Occdist scale, Goal distance bias Transform tolerance, Combination method Goal distance bias, Transform tolerance Combination method, yaw goal tolerance Update frequency, Cost scaling factor Mission Success 1 3 4 1 3 4                               Energy Mission Success Option Rank Path dist bias Occdist scale Recovery executed RNS Mission success Traveled distance Energy Tranform tolerance Publish frequency Planner Costmap 2d Navigation Stack Sub-systems Performance Metrics Performance Objectives Causal Interaction Goal dist bias A partial causal model discovered in our experiments using Husky in simulation Applying CaRE Configuration options which rank higher have the strongest influence on the performance objectives. Takeaway             Energy            Energy        Energy               Mission success             Mission success             Mission success Energy                                 Mission success Evaluation Results • Robotic systems are highly configurable, hundreds or even thousands of possible software and hardware configuration options interacting non-trivially. • Incorrect configuration (misconfiguration) can cause buggy behavior resulting in both functional and non-functional faults. • Performance influence models, such as regression models suffer from several shortcomings including, • Producing incorrect explanations • Non-transferable • Training data collection is expensive from physical hardware Challenges The Team • A novel framework for finding root causes of the configuration bugs in robotic systems. • We evaluated CaRE conducting a comprehensive empirical study in a controlled environment across multiple robotic platforms, including Husky and Turtlebot 3 both in simulation and physical robots. • We demonstrated the transferability of the causal models by learning the causal model in the Husky simulator, and reusing it in the Turtlebot 3 physical platform Key Conributions https://github.com/ softsys4ai/care