Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Мы не так уже различаемся, как мы похожи: Сравнение IT vs. OT на примере ролей SOC аналитика и АРМ диспетчера

Мы не так уже различаемся, как мы похожи: Сравнение IT vs. OT на примере ролей SOC аналитика и АРМ диспетчера

Мы не так уже различаемся, как мы похожи: Сравнение IT vs. OT на примере ролей SOC аналитика и АРМ диспетчера, Марина Кротофил

Во времена новомодной тенденции ОТ и ИТ интеграции, как правило ИТ терминология и мировоззрение автоматически переносятся на ОТ, без попытки понять особенностей АСУ ТП и менталитета персонала, причастных к управлению физическими процессами. Такой подход приводит к непониманию и полному отсутствию доверия этих команд друг к другу. В этом докладе Марина Кротофил подойдет к сравнению ИТ и ОТ миров с технической точки зрения на примере ролей SOC аналитика и АРМ диспетчера. В своем докладе, Марина покажет, что эти роли одновременно фундаментально похожи и в то же время существенно различаются в философии настройки сигналов тревоги/уведомлений, принципах реагирования на инциденты и особенностях других процессов. Цель этого доклада скорее информативная, чем обучающая и направлена на инициализацию профессиональной дискуссии "по существу".

Присоединяйтесь к сообществу единомышленников по кибербезопасности АСУ ТП RUSCADASEC
https://t.me/RUSCADASEC

9f6c1823c8b51c8b33a389fd198068ad?s=128

RUSCADASEC

May 08, 2020
Tweet

Transcript

  1. Marina Krotofil *Vyacheslav Kopeytsev IT vs. OT: WE ARE MUCH

    MORE SIMILAR THAN WE ARE DIFFERENT – COMPARING PROCESS CONTROL ROOM AND SOC OPERATIONS RUSCADASEC CON, 8 May, 2020
  2. Industry 4.0 Horror: IT-OT Conversion Corporate IT Industrial IT Information

    Technology (IT) Operational Technology (OT) Informatics Physical Process Process Engineering
  3. Industry 4.0 Horror: IT-OT Conversion Corporate IT Industrial IT Information

    Technology (IT) Operational Technology (OT) Informatics Attacker’s goal Process Engineering
  4. Frequent request from OT operators Could you please design an

    infrastructure in such secure way that no monitoring would be necessary (e.g., network monitoring, log collection & review)
  5. Second argument attempt https://headtopics.com/images/2020/1/24/newsweek/escaped-prisoner-sends-postcard-to-prison-directors-greetings-from-thailand-1220530432657240065.webp

  6. Second argument attempt

  7. Successful argument attempt

  8. Layers of safety protections

  9. Layers of security protections https://commons.wikimedia.org/wiki/File:Video_tape_archive_(6498650083).jpg

  10. Agenda SOC vs. Control Room Use-case: OT incident Conclusions

  11. 11 IT/OT convergence: SOC analyst and Control Room operator IT

    / analyst OT / operator
  12. The only common discussion point?

  13. SOC vs. Control Room Operations https://habr.com/ru/company/croc/blog/353324/ https://en.wikipedia.org/wiki/Distributed_control_system#/media/File:Leitstand_2.jpg

  14. 14 SOC analyst and Control Room operator • Monitoring of

    IT infrastructure • Reacts to Alerts • Protects from threats (mostly human factor) • Responsible for security • Confidentiality • Integrity • Availability • Frequently outsources • Room for creativity in processes • Wears a cloth of choice • Monitoring of physical processes* • Reacts to Alarms • Protects from hazards (mostly natural causes factor) • Responsible for safety • Uptime • Max of economic profit • (Safety and pollution) • Mostly in-house • Very standardized processes • Wears protective cloth
  15. 15 *In some cases: Monitoring of supporting infrastructure Physical process

    Supporting infrastructure
  16. 16 SOC analyst and Control Room operator • Monitoring of

    IT infrastructure • Reacts to Alerts • Protects from threats (mostly human factor) • Responsible for security • Confidentiality • Integrity • Availability • Frequently outsources • Room for creativity in processes • Wears a cloth of choice • Monitoring of physical processes* • Reacts to Alarms • Protects from hazards (mostly natural causes factor) • Responsible for safety • Uptime • Max of economic profit • (Safety and pollution) • Mostly in-house • Very standardized processes • Wears protective cloth
  17. Alert vs. Alarm  An Alert is a signal that

    draws attention to something. An alert state refers to a longer period of time during which increased attention remains in effect  An Alarm is a short warning that draws immediate attention to a danger. It usually does not refer to a longer period of time
  18. 18 SOC analyst and Control Room operator • Monitoring of

    IT infrastructure • Reacts to Alerts • Protects from threats (mostly human factor) • Responsible for security • Confidentiality • Integrity • Availability • Frequently outsources • Room for creativity in processes • Wears a cloth of choice • Monitoring of physical processes* • Reacts to Alarms • Protects from hazards (mostly natural causes factor) • Responsible for safety • Uptime • Max of economic profit • (Safety and pollution) • Mostly in-house • Very standardized processes • Wears protective cloth
  19. Safety Protection Layers: „Financial Alarms“

  20. Maximization of economic profit https://www.slideserve.com/Antony/the-birth-of-asm

  21. 21 SOC analyst and Control Room operator • Monitoring of

    IT infrastructure • Reacts to Alerts • Protects from threats (mostly human factor) • Responsible for security • Confidentiality • Integrity • Availability • Frequently outsources • Room for creativity in processes • Wears a cloth of choice • Monitoring of physical processes* • Reacts to Alarms • Protects from hazards (mostly natural causes factor) • Responsible for safety • Uptime • Max of economic profit • (Safety and pollution) • Mostly in-house • Very standardized processes • Wears protective cloth
  22. 22 Commonality: Novel Challenges • Typical monitoring object o Security

    controls/infrastructure • Unforeseen events which invalidate security assumptions Unexpected interdependencies due to infrastructure complexity • Typical monitoring object o Physical process • Unforeseen events which invalidate safety assumptions Unexpected process upsets due to human-in-the-system
  23. https://i2.wp.com/staging.gbhackers.com/wp-content/uploads/2017/01/BvJniTg-2.png?resize=1068%2C727&ssl=1 Security Operations Center (SOC)

  24. SOC: Typical components Events Events Events Events Rules Vizualization Ticket

    Systems External rules / IoCs Correlation / SIEM
  25. SOC: Typical components Events Events Events Events Rules Vizualization Ticket

    Systems External rules / IoCs Correlation / SIEM
  26. SOC: Typical components Events Events Events Events Rules Vizualization Ticket

    Systems External rules / IoCs Correlation / SIEM
  27. SOC: Sources of events  Security infrastructure (endpoint security, IDPS,

    DLP, VPN, FW, honeypots, etc.)  Network infrastructure (routers, switches, AP, DBs (SQL/Oracle, LDAP, Radius))  Client endpoints (security and windows events, application logs)  Web and email servers  Servers (OS and application logs)  Virtualization infrastructure  Usage of user / service accounts  Non-log information (asset inventory, vulnerability reports, network maps, configs)  Etc.
  28. Pyramid of pain https://www.oreilly.com/library/view/intelligence-driven-incident-response/9781491935187/ch04.html

  29. Indicators of compromise https://threatpost.com/misunderstanding-indicators-of-compromise/117560/

  30. SOC: “Tiers of Ticket Response” Distribution of responsibilities between tiers

    may vary:  Tier 1 – Alert analyst (frequently outsourced)  Tier 2 – Incident Responder (sometimes outsourced)  Tier 3 – Subject Matter Expert/ Hunter • SOC engineer • Incident responder • Reverse engineer • Threat intelligence analyst
  31. https://eng.heroya-industripark.no/var/site/storage/images/media/images/statoil2/statoil-kontrollrom/67513-1-nor-NO/statoil-kontrollrom_size-medium.jpg Control Room in an industrial plant

  32. Control room: Typical components Events Events Events Events DCS HMI

    Shift report APC, Predictive maintenance Historian Eng. workstation
  33. OT: Sources of data Process data • Pre-alarm • Low

    (LL) / high (HH) limits • Rate of change  Equipment status, diagnostics  Safety systems  Alarms from packaged units  F&G systems  Video surveillance feed http://blog.canarylabs.com/2016/06/27/a-guide-to-the-best-data-historian-software-a-review-of-the-canary-historian-versus- rockwell-factorytalk-and-osisoft-pi
  34. Control room: Events Sources Events Sources ANSI/ISA-18.2-2016 Management of Alarm

    Systems for the Process Industries
  35. Human Machine Interface (HMI) https://1.bp.blogspot.com/-4f6UFLToOpI/Ti3jII9fLzI/AAAAAAAAAYY/oQXHuuAr2c8/s1600/wincc-flexible-runtime-screen-02-1024.jpg

  36. HMI alarms white-paper-alarm-management-deltav-en-57058.pdf

  37. Fundamental design of HMI Интерфейс оператора позволяет операторам сконцентрировать свои

    умственные ресурсы на управлении процессом, а не на взаимодействии с программным обеспечением системы. Это означает, что операторская панель является понятной и простой в использовании с точки зрения минимальных требований к интеллектуальным и физическим ресурсам операторов консоли для понимания и взаимодействия с системой управления технологическими процессами. https://www.controleng.com/articles/creating-an-asm-compliant-hmi-goes-deeper-than-screen-color-selection/
  38. SOC vs. Control Room: Alarm Tuning https://habr.com/ru/company/croc/blog/353324/

  39. Definition of “expensive” differs in IT and OT

  40. Definition of “urgency” differs in IT and OT

  41. Definition of “urgency” differs in IT and OT On average,

    companies take about 197 days to identify and 69 days to contain a breach according to IBM. https://www.ibm.com/downloads/cas/AEJYBPWA
  42. Definition of “urgency” differs in IT and OT At 1:23

    pm reactor cooling problem identified. At 1:33 pm the reactor burst and its contents exploded, killing 4 and injuring 38 people https://www.csb.gov/t2-laboratories-inc-reactive-chemical-explosion/
  43. IT alert prioritization: Criticality of security control https://www.tenable.com/sites/drupal.dmz.tenablesecurity.com/files/images/sc-dashboards/NEW%20- %20NIST_80053_Dashboard_Drupal_Screenshot_revised.png

  44. IT alert prioritization: Attacker progression

  45. IT alert prioritization: Asset criticality Customer serving servers Critical application

    servers / DBs
  46. SOC: Alarm tuning  Threat driven: Outbound traffic to know

    C2 server  Policy driven: Usage of domain admin account  Anomaly centric: High volume scanning from a single workstation Mostly heuristic alarm threshold tuning  Goal is to minimize false positives and noise  Alerting on known IoC or obvious threats such as usage of domain admin credentials  Setting up a threshold for AV alerts or brute force activities  Alerting based on behavioral patterns
  47. OT: Alarm management guidlines REDUCTED REDUCTED REDUCTED Source: Internet

  48. OT: Target alarm rate

  49. OT: Alarm prioritization

  50. Parameters involved in establishing alarm setting Operating envelope

  51. Advanced process control https://blog.yokogawa.com/blog/what-is-apc https://www.mec-value.com/english/solution/system/advanced.html

  52. Uncertaintainties involved into alarm settings

  53. Alarm response time https://www.controlglobal.com/assets/00_images/2015/08/CG1508-AlarmsFeat2-Fig2-2.jpg

  54. Abnormal Situation Management (ASM) Consortium The ASM Consortium promotes their

    vision by conducting research, testing and evaluating which contribute to the successful reduction of abnormal situations in chemical processes. https://www.amazon.com/Effective-Console-Operator-HMI- Design/dp/1514203855
  55. Layers of HMI views What is displayed in each level

    is plant (customer) specific, there is only general guidance:  Level 1 – plant overview  Level 2 – unit overview  Level 3 – equipment overview  Level 4 – trends / elements of contro logic https://www.asmconsortium.net/Documents/2009%20ASM%20Displays%20GL%20Webinar%20v014.pdf Trends are one of the most important displays Effective Text/ Object Size
  56. IT vs. OT: Continuous Operations and Incident Response https://habr.com/ru/company/croc/blog/353324/

  57. Threat hunting and proactive monitoring IT  Proactive vulnerability scanning

     Search of affected systems based on recent IoCs OT  Mostly monitor sensor signals trends to detect early signs of process deviation https://i.ytimg.com/vi/TC0y6vXDvRw/maxresdefault.jpg
  58. Playbooks, SoP, Help, etc. https://www.emerson.com/documents/automation /white-paper-alarm-help-deltav-en-57056.pdf https://flexibleir.com/cyber-security-incident-response-playbook IT: Playbook OT:

    Alarm help
  59. Incident response in IT

  60. OT: Incident reporting and information sharing  Large incidents such

    as pollution and safety incidents (regulation violation) must be reported & investigation reports are made public • Learning opportunity  Non-safety related incidents are not made public  Indicators of process upset are not publicly shared • E.g., sensor signatures or specific calculations • Generally unique to each facility/unit • Proprietary knowledge https://www.csb.gov/bp-america-refinery-explosion/
  61. IT: Cyber threat information exchange https://www.us-cert.gov/Information-Sharing-Specifications-Cybersecurity https://www.circl.lu/services/misp-malware-information-sharing-platform/

  62. IT: Cyber threat information exchange vs. push vs. pull

  63. OT: Predictive maintenance or Indicators of Equipment Failure/process upset (proprietary)

    http://www.mantis-project.eu/deep-learning-for-predictive-maintenance/
  64. OT: Alternatives for information exchange

  65. IT humor https://vangogh.teespring.com/og_pic/84274975/3250578/front.jpg?v=2019-08-05-04-00&background-image=wood&effects=inner-glow

  66. OT: Historical storage – need & requirement https://commons.wikimedia.org/wiki/File:Video_tape_archive_(6498650083).jpg

  67. Historians: Lossy and lossless compression https://commons.wikimedia.org/wiki/File:Video_tape_archive_(6498650083).jpg https://www.osisoft.com/pi-system/pi-capabilities/pi-system-tools/pi-vision/ http://www.dataparc.com/2016/01/12/process-data-compression-why-its-a-bad-idea/

  68. IT word: Still not so advanced  Ring buffer for

    storing logs on the host (e.g., few days)  pcap upon event detection  Forensic data are collected on case-basis • Only specific logs • Hard drive image • Memory dump  An example of “advanced” solution: storing logs instead of disk image • Allows to reduce storage from Gb/Trb to Mb • Take system as an input and extracts relevant log/event data https://plaso.readthedocs.io/en/latest/sources/user/Users-Guide.html
  69. Use Case: OT Cyber Incident https://habr.com/ru/company/croc/blog/353324/

  70. 70 Railway tank car filling facilities  Periodic system crashes

    at random times for several months (DoS)  No communication with ERM system  Vendor support team arrives & reboots system EVERYTHING WORKS AGAIN!! Vendor concludes that systems works fine Scheduled security assessment finds traces of DoublePulsar infection 06.2017 Start of facility DoS 09.2017 Security Assessment
  71. 71 Railway tank car filling facilities  Periodic system crashes

    at random times for several months (DoS)  No communication with ERM system  Vendor support team arrives & reboots system EVERYTHING WORKS AGAIN!! Vendor concludes that systems works fine Scheduled security assessment finds traces of DoublePulsar infection 06.2017 Start of facility DoS 09.2017 Security Assessment
  72. 72 Railway tank car filling facilities  Periodic system crashes

    at random times for several months (DoS)  No communication with ERM system  Vendor support team arrives & reboots system EVERYTHING WORKS AGAIN!! Vendor concludes that systems works fine Scheduled security assessment finds traces of DoublePulsar infection 06.2017 Start of facility DoS 09.2017 Security Assessment
  73. What did DoublePulsar do in OT network ? 73 Initial

    dropper Servers Workstations Other networks SMB exploit tasksche.exe kill switch zip archive Keys Text of demand Tor config dll in memory Encryption Ransom GUI Other stuff
  74. What did DoublePulsar do in OT network ? 74 Initial

    dropper Servers Workstations Other networks SMB exploit tasksche.exe kill switch zip archive Keys Text of demand Tor config dll in memory Encryption Ransom GUI Other stuff
  75. 75 DoS root cause IP Infection time 172.23.22.181 28.07.2017 02:29:22

    192.168.204.7 29.07.2017 08:51:09 192.168.23.68 29.07.2017 11:08:32 172.23.12.94 29.07.2017 14:14:46 192.168.145.37 30.07.2017 02:25:42 192.168.59.123 30.07.2017 05:44:58 172.23.4.153 30.07.2017 13:59:21 192.168.154.82 30.07.2017 15:50:43 192.168.123.173 30.07.2017 17:26:26 192.168.126.49 31.07.2017 05:49:03 172.23.16.200 31.07.2017 08:08:27 10.2017 Incident Response
  76. 76 DoS root cause 10.2017 Incident Response The malicious program

    generates IP addresses to attack in the format: <O1>.<O2>.<O3>.<O4>
  77. 77 Undocumented network connections 2015 Firewall bypass  Startup and

    commissioning in 2015  Vendor‘s subcontractor requested remote access via TeamViewer
  78. 78 WannaCry attack 05.2017 Global WannaCry attack IT Network OT

    Network Subsidiary
  79. 79 WannaCry attack 05.2017 WannaCry attack IT Network OT Network

    Subsidiary
  80. 80 WannaCry attack 05.2017 Gloabal WannaCry attack IT + SOC

    personnel  Closed relevant ports on FW between company & subsidiary  Made sure to install necessary updates and update AV signatures to prevent/detect WannaCry threat Sent email to OT colleagues with the recommendation to install relevant Windows updates  Sent email to subsidiary to inform about ongoing attack from their site
  81. 81 WannaCry attack 05.2017 Global WannaCry attack 2015 Firewall bypass

    06.2017 Start of facility DoS 09.2017 Security Assessment 10.2017 Incident Response IT Network OT Network Contractor Subsidiary
  82. Key Takeaways IT & OT teams could have worked closer

    together  The OT Team could explain to IT colleagues that patching is unwanted due to availability concerns  The IT team could share relevant information about security incidents with OT colleagues so that the OT team: • Would have had a better risk perception via remote access that bypasses the FW • Could distinguish cyber incident from system failure • Could help IT team to resolve the cyber incidents
  83. TRITON incident  During code injection, safety PLC generated alarms

     Why was there no operators’ reaction? http://www.supracontrols.com/TriconexSOE%20PC_Interface.aspx Collaboration between OT & IT would allow to identiy cyber incident during first plant trip
  84. Conclusions https://habr.com/ru/company/croc/blog/353324/

  85. Commonality: Next generation products ;-) Next generation Firewall (NGFW) High

    performanse HMI (HPHMI) https://www.marketexpert24.com/wp-content/uploads/2019/09/Next- Generation-Firewall-NGFW-Market-1.jpg https://www.pas.com/PAS/media/pas-assets/resources/white%20papers/maximize-operator-effectiveness-part-2.pdf
  86. Conclusions  Even if the activities of the SOC and

    control room are very similar, it is important to be aware of each other differences: • Priorities • Vocabulary • Context  Both, IT und OT, have own strengths and offer opportunities for cross-field Learnings • Make IT-OT conversion a reality! • Make Industrie 4.0 great again!
  87. Let’s talk! Marina Krotofil @marmusha