Monday, May 7, 2012

How to: Using R to Analysis IIS Logs

IIS Logs are basically CSV files, except it might have several headers in one file. like the following, R is one open-source powerful analytics language, I will do one quick demo see how we can use the R to do ad-hoc analysis for IIS logs.
  the IIS log files looks like the following,
image
the comments begin with # and the same for headers.  R has the built-in data importer for CSV file. and it has a lot options as here,
image

FOR IIS log file, we only need to uncomment one header file, this will tell R how to parse the correct Log files and using # as the comment.char. so you may just uncomment the 4th lines above by remove the #Fields: in the line beginning.

image
then open you fav R IDE, I will use RStudio.
load the raw files to one list named IIS. and run names to get the column names, this will make sure it parse the file correctly.
image
then you can run typeof(iis) to get it’s list object, and nrow and ncol to query the record count, and column count.
image

Now, let’s do some basic analysis

Q1: grouping the result by response code,And plot it.
image

Or group by request,
image

Q2: get the top 5 url by request count,
image

Q3: Count all the .css request ,get top 10s
image

Q4: Combine all the logs in one folder, and put all the data together

basedir="F:/inetpub/logs/LogFiles/W3SVC1/"
LoadData=function (filename)
{
  data=read.csv(filename,header=TRUE,comment.char="#",sep=" ");
  data
}

data=data.frame();

files=list.files(basedir,pattern="*.log")
for(i in 1:length(files))
{
  temp= LoadData(paste(basedir,files[i],sep=""));
  print(nrow(temp));
  data=rbind(data,temp)
}
nrow(data)


Q5: Get the number of request distribution by client, ( identity who request the more)

reqbyiptop10= head(sort(table(iis$c.ip),decreasing=TRUE),10)

ba=barplot(reqbyiptop10,col=rainbow(length(reqbyiptop10)),ylim=c(0,max(reqbyiptop10)*1.2),ylab="req Count")
text(ba,reqbyiptop10,reqbyiptop10,pos=3)


get something like this,
image

1 comment:

Alan Nicholas said...


# Here's how to read the log file and assign the column names.

logfile = "W3C_log_filename"
logcols = read.table(logfile, header = FALSE, sep = " ", skip = 3, nrows = 1, comment.char = "")
iislog = read.table(logfile, header = FALSE, sep = " ",comment.char = "#")
logcols[,1] <- NULL
names(iislog) <- unlist(logcols[1,])
View(iislog)

 
Locations of visitors to this page