Please use this identifier to cite or link to this item: https://openscholar.ump.ac.za/handle/20.500.12714/1064
Title: A big data analysis model for futuristic insights in cybersecurity networks.
Authors: Leutle, Mangilanyane Precious.
University of Mpumalanga
Keywords: Big data.;Cybersecurity.;Cyber threat detection.;Cyberattacks.;Machine learning.;Algorithms.;Classification.;Random Forest.;Supervised learning.
Issue Date: May-2026
Abstract: Big data has changed how organizations collect, process, store, and use information for informed decision-making. Big data is produced every day in various forms and is characterized by the 5Vs, namely: volume, velocity, variety, veracity, and value. Processing such data is a challenge, as many organizations use traditional transactional processing systems that are increasingly overwhelmed by the data currently generated by the wide adoption of Information Technology and the Internet. This makes security governance and data analysis a challenge. As such, organizations have found themselves having to enhance their cybersecurity measures to prevent malicious data from penetrating the network. By analyzing big data, organizations can gain better insights to achieve value. This motivated the study to use machine learning to develop a big data analysis model to gain futuristic insights into cybersecurity networks. Futuristic insights refer to the model’s proactive approach to predicting previously unknown threat patterns based on learned historical characteristics, rather than the traditional reactive signature-matching approach. Machine learning has been explored as a proactive approach to cyber threat detection in recent years; however, there are limited studies in this field. This study examined the application of machine learning algorithms to identify cyber threats within a network. This study follows an experimental research strategy. This strategy involved synthetically generating data to mimic real-world data that consists of current cyber threats in cybersecurity networks. The original dataset consisted of 50,000 records and six features, which were later engineered into new features because some of the original features had limited predictive power for the target variable. The algorithms known to work well for classification tasks were trained, including Random Forest, Gradient Boosting Classifier, KNeighbors, Logistic Regression, Decision Tree, Gaussian Naive Bayes, and Multilayer Perceptron. The algorithms were further tuned to ensure that they all reached optimal performance. The performance evaluation results, accuracy, precision, F1-score, and recall, were used to determine the best-performing algorithm for building the final model. The study revealed that the best-performing model achieved 97% accuracy.
Description: Dissertation (Master(Computing))--University of Mpumalanga, 2026
URI: https://openscholar.ump.ac.za/handle/20.500.12714/1064
Appears in Collections:Dissertation / Thesis

Files in This Item:
File Description SizeFormat 
Mangilanyane-Precious-Leutle-201626764.pdfDissertation1.86 MBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check


Items in UMP Scholarship are protected by copyright, with all rights reserved, unless otherwise indicated.