Wednesday, March 31, 2010

Generating report from McAfee sidewinder 7.0 log files

Problem:

My colleague was tasked to generate report from the log files churned out by sidewinder 7.0.

Something like this.....



------------------------------------------------------------
SOURCE ADDRESS DESTINATION ADDRESS SERVICE COUNT RULE
------------------------------------------------------------
xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx ping 15000 12345


------------------------------------------------------------
DESTINATION IP COUNT PERCENTAGE
------------------------------------------------------------
xxx.xxx.xxx.xxx 17281 8.3333%


------------------------------------------------------------
SOURCE IP COUNT PERCENTAGE
------------------------------------------------------------
xxx.xxx.xxx.xxx 17281 8.3333%


------------------------------------------------------------
SERVICE NAME COUNT PERCENTAGE
------------------------------------------------------------
ping 17281 8.3333%


------------------------------------------------------------
DATE COUNT PERCENTAGE
------------------------------------------------------------
28-02-2010 17281 8.3333%


------------------------------------------------------------
HOURS COUNT PERCENTAGE
------------------------------------------------------------
12AM - 1AM 17281 8.3333%


Points to note:

There was an existing Perl script that does the conversion from 6.0 -> 7.0, but a monthly log conversion takes roughly 12 hrs to complete one conversion

Solution:

Since i'm more trained in C#, i've chose to write a windows application for him, and for me to put my designing & algorithm skills to test.

Some sample log data:

28-02-2010 00:00:00 Local0.Notice 172.16.20.3 Feb 27 23:56:57 auditd: date="Feb 27 23:56:57 2010 UTC",fac=f_ping_proxy,area=a_proxy,type=t_nettraffic,pri=p_major,pid=63779,ruid=0,euid=0,pgid=63779,logid=0,cmd=pingp,domain=Ping,edomain=Ping,hostname=xxxxxxxxx,event=proxy traffic end,service_name=ping,netsessid=4b89b107000c933b,srcip=xxx.xxx.xxx.xxx,srcburb=dmz2,protocol=1,dstip=xxx.xxx.xxx.xxx,dstburb=external,bytes_written_to_client=40,bytes_written_to_server=40,rule_name=xxxxxxxxx,cache_hit=0,request_status=0,start_time="Sat Feb 27 23:55:51 2010"

The first problem i realised was that the supposed "CSV" (Comma-seperated-values) are not really a CSV, it was part tab delimited and comma delimited. Well, it wasn't that hard to tokenize/split it, just do a string.Split(',') followed by string.Split('\t').

string[] splitByTabArray = new string[5];
string[] splitByCommaArray = new string[30];

if (!string.IsNullOrEmpty(line))
{
    splitByTabArray = line.Split('\t');
}
if (!string.IsNullOrEmpty(OneLineSplitted[4]))
{
    splitByCommaArray = splitByTabArray[4].Split(',');
}

So now, we have 2 collections of string array. The next problem is that i've only shown you one line of sample log, the actual log is much more complicated. Why do i say so?

Because in the auditd: section, the number of string objects to be split varies in quantities. To make things easier, i've asked my colleague what columns are used to generate the reports. 

After that i made a struct.

public struct Report
        {
            public DateTime Time
            {
                get;
                set;
            }
            public string Date
            {
                get;
                set;
            }
            public string Type
            {
                get;
                set;
            }
            public string SrcIp
            {
                get;
                set;
            }
            public string DestIp
            {
                get;
                set;
            }
            public string Pid
            {
                get;
                set;
            }
            public string ServiceName
            {
                get;
                set;
            }
        }

The reason i've created a struct is because i have to group records as you can see from the sample report file. But that doesn't solve the problem with a struct, you might think. Yup it didn't but it gives me IQueryable support so i can use LinQ to do grouping, selecting, joining etc.

A sample query:

var topDest = (from oneReport in allReports
                           where oneReport.DestIp != null && oneReport.Type.Equals(TYPE)                          
                           group oneReport by oneReport.DestIp into groupedReports
                           orderby groupedReports.Count() descending
                          
                          select new { groupedReports.Key, Count = groupedReports.Count(), Total = allReports.Count }).Take(10);
I've decided not to put up the full source code up as, it's still not perfect, optimized and... i have no idea, i just want to keep it till someone out there that really needs it contact me and i would share it.

A brief approach for solution:

  1. Tokenizing all the required data
  2. Using a struct to give me IQueryable support and ability to work with object
And to top it off, i manage to convert 60MB of logs in 15seconds, and 7GB in less than an hour. Talk about efficiency :)

No comments:

Post a Comment