Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IEEE DSC - 01 FEB 2021 - JAPAN

Harry Trinh
January 28, 2021

IEEE DSC - 01 FEB 2021 - JAPAN

IEEE DSC - 01 FEB 2021 - JAPAN
http://nsclab.org/dsc2021/program.html

BDF-SDN: A Big Data Framework for DDoS Attack Detection in Large-Scale SDN-Based Cloud
Dinh Phuc Trinh and Park Minho

Harry Trinh

January 28, 2021
Tweet

More Decks by Harry Trinh

Other Decks in Research

Transcript

  1. DSC 2021: The IEEE Conference on Dependable and Secure Computing

    Fukushima, Japan | 30 Jan - 2 Feb, 2021 BDF-SDN: A Big Data Framework for DDoS Attack Detection in Large-Scale SDN-Based Cloud D. Phuc Trinh, Minho Park - Soongsil University, Seoul, South Korea [email protected]
  2. Outline I. Problem Statement, Motivation, Contributions II. Preliminaries III BDF-SDN

    – Proposed Framework IV. Implemental Setup V. Performance Evaluation VI. Conclusion & Future Work DSC 2021: The IEEE Conference on Dependable and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021
  3. I. Problem Statement DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 • Currently, there are two main approaches of DDoS attack detection methods in the SDN environment. • A threshold-based approach: several traffic indicators such as traffic rate, and packet delay are kept track of; if the indicators exceed a predetermined threshold, then an attack may occur in the network. This approach results in high false-error rates because of a hard threshold. • A feature-based approach: using machine-learning algorithms to classify normal and attack traffic, which performs better and is widely adopted by many researchers. This approach is highly resource-intensive and is unable to perform reliably since traditional data processing workflows impose several limitations of processing a large amount of data in large- scale networks including cloud servers and data centers. In addition, big data analytics based on machine learning and deep learning techniques require a scalable big data solution to process data and make predictions in real-time with high accuracy, robust, and efficiency. 1/18
  4. I. Motivation DSC 2021: The IEEE Conference on Dependable and

    Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 2/18 Understanding the difficulties given above: • We propose a ➢ Scalable ➢ high availability ➢ fault tolerance framework to process big data in an SDN context where applications as feature-based DDoS detection need a framework to operate efficiently and reliably. • Our simulations showed that such applications improve performance and reliability, and it is a must workflow to follow in a large-scale SDN network.
  5. I. Our Contributions DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 3/18 In summary, our major contributions are as follows: • First, we investigate limitations on machine learning workflows of existing DDoS detection schemes in a large-scale SDN network. • Second, we propose a complete framework called Big Data Framework for Large-Scale SDN (BDF-SDN) to overcome the limitations and to exploit distributed resources in an SDN context efficiently. • Our framework was conducted in a real computing environment including OpenStack integrated with the SDN environment using both ONOS controller and big data platforms such as Apache Kafka, Apache Hadoop, and Apache Spark. • Lastly, we present a comprehensive performance analysis of BDF-SDN.
  6. II. Preliminaries DSC 2021: The IEEE Conference on Dependable and

    Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 4/18 Figure 1: Big Data integrated with SDN architecture
  7. II. Preliminaries - Apache Kafka DSC 2021: The IEEE Conference

    on Dependable and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 5/18 • Apache Kafka is a distributed event streaming platform used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. • Kafka stores key-value messages that come from arbitrarily many processes called producers. The data can be partitioned into different partitions within different topics. Within a partition, messages are strictly ordered by their offsets (the position of a message within a partition) and indexed and stored together with a timestamp. Other processes called consumers can read messages from partitions. Figure 2: Kafka architecture
  8. II. Preliminaries - Hadoop Distributed File System DSC 2021: The

    IEEE Conference on Dependable and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 8/18 • Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. HDFS has a master-slave architecture. • An HDFS cluster consists of a single NameNode which is a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes • Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode. The NameNode and DataNode are pieces of software designed to run on commodity machines, which means they can be deployed on a wide range of machines. Figure 3: HDFS architecture
  9. II. Preliminaries - Apache Spark DSC 2021: The IEEE Conference

    on Dependable and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 9/18 Apache Spark is an open-source distributed cluster-computing framework which is a unified analytics engine for big data processing, with built-in modules for: • Machine learning • Streaming • SQL • Graph processing Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Spark has its architectural foundation in the Resilient Distributed Dataset (RDD), a read- only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. Figure 4: Apache Spark ecosystem
  10. III. BDF-SDN Our Proposed Framework DSC 2021: The IEEE Conference

    on Dependable and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 10/18 Figure 5: The complete scheme of BDF-SDN
  11. III. BDF-SDN Our Proposed Framework DSC 2021: The IEEE Conference

    on Dependable and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 11/18 Algorithm 1 shows a typical DDoS attack traffic classification using Logistic Regression. Here, points is being re-evaluated upon every iteration, which is unnecessary and consumes high resource usage. To evade this issue, persist() function (as shown below) provided by Apache Spark can be used to avoid evaluating points on every iteration so that the dataset sitting in memory can be re-used each time. This can save time and resource computation.
  12. IV. Experimental Setup DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 12/18 Figure 6: Our testbed Figure 7: Cluster Settings
  13. IV. Experimental Setup DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 13/18 Figure 9: Hadoop Cluster Settings Figure 8: Kafka Cluster Settings
  14. IV. Experimental Setup DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 14/18 Figure 10: Spark Cluster Settings
  15. IV. Experimental Setup DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 15/18 Figure 11: Apache Spark 3.0 architecture
  16. V. Framework Performance DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 16/18 Figure 12: Spark MLlib’s benchmark including CPU, memory, Disk usage
  17. V. Framework Performance DSC 2021: The IEEE Conference on Dependable

    and Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 17/18 Figure 13: Kafka cluster’s benchmark Figure 14: HDFS cluster’s benchmark
  18. VI. Conclusion DSC 2021: The IEEE Conference on Dependable and

    Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 18/18 • In this work, we proposed a high-performance framework used for machine-learning- based DDoS attack solutions. • Our framework covers all stages of detecting DDoS attacks from collecting streaming statistics coming from different OpenFlow switches to removing/deleting detected abnormal flows. • For the future work, we expect to improve the proposed framework and compare our framework with other works using more evaluation criteria.
  19. Thank You DSC 2021: The IEEE Conference on Dependable and

    Secure Computing Fukushima, Japan | 30 Jan - 2 Feb, 2021 [email protected]