My colleague was tasked to generate report from the log files churned out by sidewinder 7.0.
Something like this.....
------------------------------------------------------------
SOURCE ADDRESS DESTINATION ADDRESS SERVICE COUNT RULE
------------------------------------------------------------
xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx ping 15000 12345
------------------------------------------------------------
DESTINATION IP COUNT PERCENTAGE
------------------------------------------------------------
xxx.xxx.xxx.xxx 17281 8.3333%
------------------------------------------------------------
SOURCE IP COUNT PERCENTAGE
------------------------------------------------------------
xxx.xxx.xxx.xxx 17281 8.3333%
------------------------------------------------------------
SERVICE NAME COUNT PERCENTAGE
------------------------------------------------------------
ping 17281 8.3333%
------------------------------------------------------------
DATE COUNT PERCENTAGE
------------------------------------------------------------
28-02-2010 17281 8.3333%
------------------------------------------------------------
HOURS COUNT PERCENTAGE
------------------------------------------------------------
12AM - 1AM 17281 8.3333%
Points to note:
There was an existing Perl script that does the conversion from 6.0 -> 7.0, but a monthly log conversion takes roughly 12 hrs to complete one conversion
Solution:
Since i'm more trained in C#, i've chose to write a windows application for him, and for me to put my designing & algorithm skills to test.
Some sample log data:
28-02-2010 00:00:00 Local0.Notice 172.16.20.3 Feb 27 23:56:57 auditd: date="Feb 27 23:56:57 2010 UTC",fac=f_ping_proxy,area=a_proxy,type=t_nettraffic,pri=p_major,pid=63779,ruid=0,euid=0,pgid=63779,logid=0,cmd=pingp,domain=Ping,edomain=Ping,hostname=xxxxxxxxx,event=proxy traffic end,service_name=ping,netsessid=4b89b107000c933b,srcip=xxx.xxx.xxx.xxx,srcburb=dmz2,protocol=1,dstip=xxx.xxx.xxx.xxx,dstburb=external,bytes_written_to_client=40,bytes_written_to_server=40,rule_name=xxxxxxxxx,cache_hit=0,request_status=0,start_time="Sat Feb 27 23:55:51 2010"
The first problem i realised was that the supposed "CSV" (Comma-seperated-values) are not really a CSV, it was part tab delimited and comma delimited. Well, it wasn't that hard to tokenize/split it, just do a string.Split(',') followed by string.Split('\t').
string[] splitByTabArray = new string[5];
string[] splitByCommaArray = new string[30];
if (!string.IsNullOrEmpty(line))
{
splitByTabArray = line.Split('\t');
}
if (!string.IsNullOrEmpty(OneLineSplitted[4]))
{
splitByCommaArray = splitByTabArray[4].Split(',');
}
So now, we have 2 collections of string array. The next problem is that i've only shown you one line of sample log, the actual log is much more complicated. Why do i say so?
Because in the auditd: section, the number of string objects to be split varies in quantities. To make things easier, i've asked my colleague what columns are used to generate the reports.
After that i made a struct.
public struct Report
{
public DateTime Time
{
get;
set;
}
public string Date
{
get;
set;
}
public string Type
{
get;
set;
}
public string SrcIp
{
get;
set;
}
public string DestIp
{
get;
set;
}
public string Pid
{
get;
set;
}
public string ServiceName
{
get;
set;
}
}
The reason i've created a struct is because i have to group records as you can see from the sample report file. But that doesn't solve the problem with a struct, you might think. Yup it didn't but it gives me IQueryable support so i can use LinQ to do grouping, selecting, joining etc.
A sample query:
var topDest = (from oneReport in allReports
where oneReport.DestIp != null && oneReport.Type.Equals(TYPE)
group oneReport by oneReport.DestIp into groupedReports
orderby groupedReports.Count() descending
select new { groupedReports.Key, Count = groupedReports.Count(), Total = allReports.Count }).Take(10);
I've decided not to put up the full source code up as, it's still not perfect, optimized and... i have no idea, i just want to keep it till someone out there that really needs it contact me and i would share it.
A brief approach for solution:
- Tokenizing all the required data
- Using a struct to give me IQueryable support and ability to work with object
And to top it off, i manage to convert 60MB of logs in 15seconds, and 7GB in less than an hour. Talk about efficiency :)
No comments:
Post a Comment