Machine Learning With SecOps
We all know that security is big concern of every big firm. There are lots of cyber attack like DoS, SQL Injection Attacks (SQLi),Malware Attacks etc. I take or considered a DoS attack in which same ip is hitting the web server multiple times.
In this article i created CI/CD pipeline for same, where my logstash takes the input from the web server log, we know that every server generates log where it stored the lots of information like ip addresses, httpd_code, no of bytes transferred, timestamp and this logstash creates the ouput in csv format. Now the Role of machine learning comes in play Machine learning reads the generated data from logstash then we apply K-means clustering algorithm this model generates the cluster of different IP’s and i set the threshold if any ip hit the web server more than 100 times it will automatically blocks.
So now let’s directly jump to code
- First i create the the pipeline for log stash where my input comes from httpd log
input {
file {
id => “input”
path => [“/logs/access_log”]
start_position => beginning
}
}
filter {
grok {
match => [“message” , “%{COMBINEDAPACHELOG}”]
}
mutate {
convert => {
“response” => “integer”
“bytes” => “integer”
}
}
date {
match => [ “timestamp”, “dd/MMM/YYYY:HH:mm:ss Z” ]
locale => en
remove_field => “timestamp”
}
}
output {
csv {
id => “output”
fields => [“clientip”,”response”]
path => “/usr/share/logstash/logs.csv”
}
}
Now run this code using docker.
#docker run -it — name logstash — rm — net elknet -v var/log/httpd:/logs -v /mlsecops:/conf/apache.conf logstash:7.7.1 -f /conf/apache.conf
And here is my ML Code
import pandas as pd
dataset = pd.read_csv(‘/mlsecops/logs.csv’, names=[‘IP’,’Web_Code’])
dataset.head(5)
dataset=dataset.dropna()
dataset = dataset.groupby([‘IP’,’Web_Code’]).Web_Code.agg(‘count’).to_frame(‘Count’).reset_index()
dataset.head(5)
dataset.insert(0, ‘SNo’, range(len(dataset)))
dataset.head(5)
train_data = dataset.drop([‘IP’], axis=1)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
scaled_data = sc.fit_transform(train_data)
from sklearn.cluster import KMeans
model = KMeans(n_clusters=4)
pred=model.fit_predict(scaled_data)
data_with_pred = pd.DataFrame(scaled_data, columns=[‘IP_Scaled’, ‘Web_Code_Scaled’,’Count_Scaled’])
data_with_pred[‘Cluster’] = pred
final_data = pd.concat([dataset, data_with_pred], axis=1, sort=False)
cluster_to_block = []
for index, row in final_data.iterrows():
if final_data[‘Count’].loc[index] > 100:
cluster_to_block.append(final_data[‘Cluster’].loc[index])
cluster_to_block = max(set(cluster_to_block), key = cluster_to_block.count)
import numpy as np
from os import system
Block_IP_data = pd.read_csv(‘DoS.csv’)
for index_in_data, row_in_data in final_data.iterrows():
if final_data[‘Cluster’].loc[index_in_data] == cluster_to_block:
if final_data[‘IP’].loc[index_in_data] not in np.array(Block_IP_data[‘Block_IP’]):
Block_IP_data = Block_IP_data.append({‘Block_IP’ : final_data[‘IP’].loc[index_in_data],
‘Status’:’No’},ignore_index=True)
for index, row in Block_IP_data.iterrows():
if Block_IP_data[‘Status’].loc[index] == ‘No’:
system(“iptables -A INPUT -s {0} -j DROP”.format(Block_IP_data[‘Block_IP’].loc[index]))
Block_IP_data[‘Status’].loc[index] = ‘Yes’
Block_IP_data.to_csv(‘/mlsecops/DoS.csv’, index=False)
This is my Jenkins CI/CD pipeline
job 1:
Job 2:
Job 3:
This is my CI/CD pipeline.