Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Granular network traffic classification using chained incremental learning

Faiz Zaki
March 26, 2021

Granular network traffic classification using chained incremental learning

Slides presented for PhD candidature defense at Universiti Malaya, Malaysia.

Faiz Zaki

March 26, 2021
Tweet

More Decks by Faiz Zaki

Other Decks in Education

Transcript

  1. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning Muhammad Faiz bin Mohd Zaki – 17021637/3 Department of Computer System & Technology Universiti Malaya Supervisor: Associate Professor Dr. Nor Badrul Anuar bin Jumaat Co-Supervisor: Honorary Professor Dr. Abdullah bin Gani 26 March 2021
  2. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Introduction Network management tasks such as network monitoring becomes more complex as computer networks evolve. To manage these complexities, network traffic classification plays a pivotal role by giving better visibility to network administrators through various classification granularity. 2
  3. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Introduction (cont.) Granular network traffic classification focuses on classifying traffic into more detailed classes such as application names and services. Although some studies managed to classify at application service level (Gil et al., 2016), they were more focused on inter-application services than services from the same application (intra-application services). Less attention! 3
  4. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Aim This study aims on classifying network traffic at both application name and intra-application services. We achieve this aim by proposing a novel approach using chained incremental learning to carry out the granular classification. Contributions Secondary contribution: ground truth 4
  5. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Statement of Objectives The objectives of this study are as follows: ▪ To study appropriate discriminators to classify applications and intra- application services from network traffic. ▪ To design an incremental learning model using Adaptive Random Forest algorithm for network traffic classification. ▪ To develop the designed model to classify network traffic based on applications and intra-application services. ▪ To evaluate the developed model against existing approaches and streaming traffic using appropriate metrics. 5
  6. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Statement of Problem ▪ Most works managed to classify down to application protocol and names (Takyi et al., 2020; Sun et al., 2019 ; Cao & Fang, 2016) ▪ Less focus on granular classification, particularly intra-application services ▪ Policy on coarse-grained classification affects the entire class. As such, this study attempts to research a method which is capable of classifying network traffic into intra-application services that exist within the originating application. 6
  7. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Significance of Study ▪ With higher classification granularity, network administrators achieve greater control for network management. ▪ Ability to implement fine-grained network policies. ▪ Novel features for granular classification from this research paves way for application in various areas such as parental filtering. 7
  8. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Scope of Research This study is primarily concerned with classifying applications into application names and intra-application services. The proposed model utilizes statistical features based on payload length with incremental learning. On initial grounds, this study covers encrypted and non-encrypted traffic in a high speed LAN of up to 1Gbps. 8
  9. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Literature Review ▪ Numerous publications between 2013 until present day, indicating continuous interest in the field. ▪ Moore, Nguyen and Armitage are among the prominent researchers in this field, producing a number of seminal papers (A. W. Moore & Papagiannaki, 2005; A. Moore et al., 2005; Nguyen & Armitage, 2008) ▪ There are five categories of classification output based on the literature: application protocol, type, name, service and binary classification. ▪ The trend is moving towards applying deep learning methods to automate feature selection and classification processes. 9
  10. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Literature Review (cont.) 10
  11. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Literature Review (cont.) 11
  12. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Literature Review (cont.) Recent publications focusing on granular classification 12
  13. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Research Methodology To achieve the aim and objectives, this study followed a structured research methodology. 13
  14. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Experimental methodology This study conducted three experiments, each of which is related to another. The experiments follow yet another structured experimental methodology. 14
  15. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Research Methodology Result analysis ongoing 15
  16. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Creating a reliable ground truth for network traffic classification • One of the issues in network traffic classification is lack of publicly available ground truth. • Privacy concerns among the main causes. • Experiment 1 develops a ground truth creation tool, Grano-GT (Zaki et al., 2021) • Built on four main engines and able to build browser-based traffic ground truth reliably. • Works by isolating traffic from a target browser tab, matching string signatures and further isolates by using temporal features i.e., interarrival times and time threshold. 16
  17. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Creating a reliable ground truth for network traffic classification • Accuracy of Grano-GT’s ground truths were validated using nDPI with more than 95% average accuracy across five applications. • Five other applications had to be excluded due to unavailable signatures. • nDPI can only label at the application name level at most. Therefore, reliability at the application-service level is validated using the Kolmogorov-Smirnov test. • Kolmogorov-Smirnov test checks if a sample belongs to a reference distribution. In this study, it is used to check if two traffic samples are from the same distribution. • Calculate empirical distribution function of the samples using 𝐹 𝑡 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 ≤𝑡 𝑛 • Get the largest difference of the EDF, ∆𝑒𝑑𝑓 between the two samples. • Compare ∆𝑒𝑑𝑓 against the critical value (threshold), 𝐷𝑐𝑟𝑖𝑡,0.05 = 1.358 𝑛 • If ∆𝑒𝑑𝑓 lower than 𝐷𝑐𝑟𝑖𝑡,0.05 , then the two samples are from the same distribution. 17
  18. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Creating a reliable ground truth for network traffic classification EDF values of intra-application services EDF values of inter-application services List of collected ground truth Accuracy using nDPI 18
  19. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning • Experiment 2 and 3 demonstrate the reliability and robustness of the proposed approach in classifying network traffic at the application name and intra-application service levels. • Using the ground truth from Experiment 1, we extracted 46 initial statistical features based on payload length. To speed up feature extraction, we utilized multi-core processing (16 CPU cores). • Based on correlation analysis, we selected the seven best features as the finalized set. • Trained two adaptive random forest classifiers, chained together, to produce multi-label classification. 19
  20. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning ma_5 patterns for different application names ma_5 patterns for inter-application services 20
  21. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning ma_5 patterns for intra-application services • The figures show significant differences between patterns application names and inter-application services patterns. • Intra-application services also showed pattern difference. • The feature (moving average) captured the dynamics of payload lengths in the applications. 21
  22. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Classification results for application name Precision Recall F1 • Experiment 2 recorded 100% average accuracy while average precision, recall and f-measure at 99%. • The difference due to imbalanced data, hence the need of the precision, recall and f-measure metrics. • To avoid overfitting, the classification model went through hyperparameter tuning using random search cross validation • The classification from Experiment 2 (i.e., application name) is used as the 8th feature for Experiment 3. 22
  23. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 0 0.2 0.4 0.6 0.8 1 Precision Recall F1 23
  24. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning 0 0.2 0.4 0.6 0.8 1 NFL-react NFL-video Precision Recall F1 0 0.2 0.4 0.6 0.8 1 RD-browse RD-comment RD-post RD-react Precision Recall F1 • Experiment 3 demonstrated the ability to classify by the intra-applications services. • Intra-application services exhibited more similar characteristics (as it is from the same application) compared to inter-application services, thus more complex to classify. • The average accuracy was 97% with 84% precision, 85% recall and 87% f-measure. 24
  25. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Granular network traffic classification using chained incremental learning 0 0.5 1 DISTILLER FLIC Deep Packet RBRN Kampeas (2018) Zaki Accuracy Application name DISTILLER FLIC Deep Packet RBRN Kampeas (2018) Zaki • This study also compared the performance against existing works using the ISCXVPN2016 dataset. • We selected the most similar literature available which utilized ISCXVPN2016 dataset at the particular classification granularity. • Based on the dataset, the proposed approach achieved 98% average accuracy for both application name and service levels, after appropriate hyperparameter tuning. • The proposed approach matched the performance of Deep Packet (98%) and is slightly lower than FlowPic (99.7%), where both used complex Deep Learning algorithms. 0 0.2 0.4 0.6 0.8 1 ConvNet FlowPic Baris (2017) Bakhshi (2016) Zaki Accuracy Application service ConvNet FlowPic Baris (2017) Bakhshi (2016) Zaki 25
  26. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Limitations • The ground truth creation tool proposed in Experiment 1 is limited to browser-based traffic. Future works include expanding the capability into creating ground truth for desktop and mobile-based applications. • The features used in Experiment 2 and 3 are hand-crafted features, thus requiring human effort. Future works include exploring granular network traffic classification with automatic feature representation. • The capability of proposed approach was slightly affected if intra-application services exhibit very similar characteristics. However, the overall performance remained significant. Future works include feature improvement through feature engineering. 26
  27. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Conclusion • Although there are numerous works on network traffic classification, classifying traffic at the intra-application service level still lack exploration. • Publicly available network traffic ground truth is a long-standing issue in the domain. • Presented Grano-GT, a ground truth collection tool developed to create reliable granular ground truth. • Presented a novel approach using chained incremental learning to classify network traffic at the application name and intra-application service levels. • Proposed approach achieved f-measure scores of 99% at the application name level and 87% at the intra-application service level. • When compared to existing studies, the proposed approach recorded 98% average accuracy for both granularity levels. 27
  28. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 References Aceto, G., Ciuonzo, D., Montieri, A., & Pescapé, A. (2021). DISTILLER: Encrypted traffic classification via multimodal multitask deep learning. Journal of Network and Computer Applications, 102985. doi:https://doi.org/10.1016/j.jnca.2021.102985 Bakhshi, T., & Ghita, B. (2016). On Internet Traffic Classification: A Two-Phased Machine Learning Approach. Journal of Computer Networks and Communications, 2016, 1- 21. doi:10.1155/2016/2048302 Bu, Z., Zhou, B., Cheng, P., Zhang, K., & Ling, Z. (2020). Encrypted Network Traffic Classification Using Deep and Parallel Network-in-Network Models. IEEE Access, 8, 132950- 132959. doi:10.1109/ACCESS.2020.3010637 Kampeas, J., Cohen, A., & Gurewitz, O. (2018). Traffic Classification Based on Zero-Length Packets. IEEE Transactions on Network and Service Management, 15, 1049-1062. doi:10.1109/TNSM.2018.2825881 Lotfollahi, M., Jafari Siavoshani, M., Shirali Hossein Zade, R., & Saberian, M. (2020). Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Computing, 24(3), 1999-2012. doi:10.1007/s00500-019-04030-2 Mun, H., & Lee, Y. (2021). Internet Traffic Classification with Federated Learning. Electronics, 10(1). doi:10.3390/electronics10010027 Salman, O., Elhajj, I. H., Chehab, A., & Kayssi, A. (2019). A Multi-level Internet Traffic Classifier Using Deep Learning. 2018 9th International Conference on the Network of the Future (NOF): IEEE. Shapira, T., & Shavitt, Y. (2019, 29 April-2 May 2019). FlowPic: Encrypted Internet Traffic Classification is as Easy as Image Recognition. Paper presented at the IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). Yamansavascilar, B., Guvensan, M. A., Yavuz, A. G., & Karsligil, M. E. (2017). Application identification via network traffic classification. 2017 International Conference on Computing, Networking and Communications (ICNC): IEEE. Zheng, W., Gou, C., Yan, L., & Mo, S. (2020). Learning to Classify: A Flow-Based Relation Network for Encrypted Traffic Classification. Paper presented at the Proceedings of The Web Conference 2020, Taipei, Taiwan. https://doi.org/10.1145/3366423.3380090 29
  29. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Publications • Tahaei, H., Afifi, F., Asemi, A., Zaki, F., & Anuar, N. B. (2020). The rise of traffic classification in IoT networks: A survey. Journal of Network and Computer Applications, 154, 102538. doi:https://doi.org/10.1016/j.jnca.2020.102538 (secondary contribution) • Zaki, F., Gani, A., & Anuar, N. B. (2020, 28-29 Feb). Applications and use Cases of Multilevel Granularity for Network Traffic Classification. Paper presented at the 2020 16th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Langkawi, Malaysia. • Zaki, F., Gani, A., Tahaei, H., Furnell, S., & Anuar, N. B. (2021). Grano-GT: A granular ground truth collection tool for encrypted browser-based Internet traffic. Computer Networks, 184, 107617. doi:https://doi.org/10.1016/j.comnet.2020.107617 30
  30. PhD Candidature Defense Name Muhammad Faiz bin Mohd Zaki –

    17021637 Thank you! Slides available at https://speakerdeck.com/mfaizmzaki 31