Predicting the type of Cyber attack based on Network Packets (Intrusion) using Machine Learning models
https://huggingface.co/spaces/raghavtwenty/ids-prediction
video.mov
git clone https://github.com/raghavtwenty/ids-prediction.git
cd ids-prediction/
pip install -r requirements.txt
cd gradio/
gradio ids_ml_gradio.py
http://127.0.0.1:7860/
A firewall alone doesn’t provide adequate protection against modern cyber threats. Malware and other malicious content are often delivered using legitimate types of traffic, such as email, or web traffic. In order to solve this problem we need to step in further and examine the network traffic, this is where the Intrusion Detection System plays a major role.
An Intrusion Detection System (IDS) is a network security technology originally built for detecting vulnerability exploits against a target application or computer. The IDS is a listen-only device. The IDS monitors traffic and reports results to an administrator.
Typical intrusion detection systems look for known attack, Signature-based IDS monitors inbound network traffic, looking for specific patterns and sequences that match known attack signatures or abnormal deviations from set norms. These anomalous patterns in the network traffic are then sent up in the stack for further investigation at the protocol and application layers of the OSI (Open Systems Interconnection) model.
An IDS is placed out of the real-time communication band (a path between the information sender and receiver) within your network infrastructure to work as a detection system. It instead leverages a SPAN or TAP port for network monitoring and analyzes a copy of inline network packets (fetched through port mirroring) to make sure the streaming traffic is not malicious or spoofed in any way. The IDS efficiently detects infected elements with the potential to impact your overall network performance, such as malformed information packets, DNS poisonings, port scans and more.
IDS is either installed on your network or a client system (host-based IDS)
To predict the type of cyber attack that could have possibly occurred in a network. Having the past network logs from a server using machine learning models, We have to choose the best suitable model for the prediction. For the new input classify the type of cyber attack that has a higher chance of occurence.
- Security operations center (SOC) analysts.
- Incident responders.
- Cyber Security analysts.
- A person with adequate knowledge on networking can experiment this.
This dataset contains 5000 records of features extracted from Network Port Statistics to protect modern-day computer networks from cyber attacks and are thereby classified into 5 classes.
Switch ID - The switch through which the network flow passed.
Port Number - The switch port through which the flow passed.
Received Packets - Number of packets received by the port.
Received Bytes - Number of bytes received by the port.
Sent Bytes - Number of bytes sent by the port.
Sent Packets - Number of packets sent by the port.
Port alive Duration (S) - The time port has been alive in seconds.
Packets Rx Dropped - Number of packets dropped by the receiver.
Packets Tx Dropped - Number of packets dropped by the sender.
Packets Rx Errors - Number of transmit errors.
Delta Received Packets - Number of packets received by the port.
Delta Received Bytes - Number of bytes received by the port.
Delta Sent Bytes - Number of bytes sent by the port.
Delta Sent Packets - Number of packets sent by the port.
Delta Port alive Duration (S) - The time port has been alive in seconds.
Delta Packets Rx Dropped - Number of packets dropped by the receiver.
Delta Packets Tx Dropped - Number of packets dropped by the sender.
Delta Packets Rx Errors - Number of receive errors.
Delta Packets Tx Errors - Number of transmit errors.
Connection Point - Network connection point expressed as a pair of the network element identifier and port number.
Total Load/Rate - Obtain the current observed total load/rate (in bytes/s) on a link.
Total Load/Latest - Obtain the latest total load bytes counter viewed on that link.
Load/Rate - Obtain the current observed unknown-sized load/rate (in bytes/s) on a link.
Unknown Load/Latest - Obtain the latest unknown-sized load bytes counter viewed on that link.
Latest bytes counter - Latest bytes counted in the switch port.
Checkis_valit - Indicates whether this load was built on valid values.
vpn_keyTable ID - Returns the Table ID values.
Active Flow Entries - Returns the number of active flow entries in this table.
Packets Looked Up - Returns the number of packets looked up in the table.
Packets Matched - Returns the number of packets that successfully matched in the table.
Max Size - Returns the maximum size of this table.
TARGET --- Label - Label types for intrusions - Normal:0, Blackhole:1, TCP-SYN:2, PortScan:3, Diversion:4
- Exploratory Data Analysis (EDA)
- Cleaning
- Sampling
- Scaling
- Visualization
- Naive Bayes
- Random Forest
- XG Boost
- Cross Validation
- Hyper Parameter Tuning
- Accuracy
- Confusion Matrix
- Precision
- Recall
Best hyperparameters for XG Boost
gamma: 0
learning_rate: 0.1
max_depth: 7
min_child_weight: 1
subsample: 0.9
After preprocessing the dataset, Naive Bayes algorithm, Random Forest algorithm, XG Boost had been used for classifying the test dataset. After multiple trials The XG Boost classified the test dataset and resulted in an average of 94 % accuracy, While other algorithms resulted in less accuracy. Since the XG Boost algorithm performed better than other models and because of it's high scalability, robustness and stable performance, It is chosen for the deployment process.
Companies realize the limitations of a standard IDS. Some are reacting to build bigger and better products for their customers. New IDS solutions may come with a lower administrative burden. They may rely on machine learning to lower the risk of false positives, So staff have less to examine every day and vendors may update them simultaneously, So the system always has access to up-to-date information in real time.
END OF README