Introduction.
One of the most important website activity parameters is the resource access statistic. Such information is necessary for many purposes - optimizing of the website content, marketing campaigns improvements and also for some diagnostic tests. The detailed information regarding resource access statistic saved by the web server into the log file(s).
There are lots of applications and program tools such as "WebTrends Log Analyser" (by http://www.webtrends.com) which can parse the web server activity logs, compose the statistical information and finally display this information in user-friendly format. Majority of these programs can provide the information with resource access statistic during some fixed time interval. Also such report generators require some time to process the log files and prepare the statistic reports.
In this article we will provide simple ASP.NET application which can walk through the web server activity logs, parse them on a fly and finally display the summary statistic report for each fixed time interval (day, month, year) chronologically.
Log File Parsing.
We need to provide access to the web server activity log files in order to allow the ASP.NET application parse them. For demo purposes we will assume that our test web server configured to save all log files to the same PC where our ASP.NET application runs. All what we need is to read the log files in an appropriate order, parse each of them and finally enumerate all occurrences of the given key phrase, lexeme or a resource name.
We also will assume that the current web server stores its log files daily and names them using the following file mask: "exYYYYMMDD.log". Where YYYY denotes the year part of the log file creation date, MM - month and DD - day correspondingly. This will allow us not to parse each log file for the extracting of the log file creation date.
Finally, the algorithm of iterating through the log files and finding all occurrences of the specified phrase is shown below:
Private
Function ProcessFile(ByVal fileName As String, ByVal checkWord As String) As Integer
Dim wordCount As Integer = 0
Dim fs As FileStream = New FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
Dim sr As StreamReader = New StreamReader(fs)
Dim s As String
s = sr.ReadLine()
While (s) <> Nothing
If s.ToUpper().IndexOf(checkWord.ToUpper()) > -1 Then
wordCount += 1
End If
Loop
sr.Close()
fs.Close()
Return wordCount
End Function Private
Function ProcessFilesByDate(ByVal checkWord As String, ByVal startDate As DateTime, ByVal endDate As DateTime) As Integer
Dim totalWordCount As Integer = 0
Dim dt As DateTime = startDate
Do While dt <= endDate
Dim file0 As String = String.Format(LogNameFormat, dt.ToString(LogNameDateFormat))
file0 = String.Format("{0}\{1}", LogPath, file0)
If File.Exists(file0) Then
Dim wordCount As Integer = ProcessFile(file0, checkWord, True)
totalWordCount += wordCount
AddLogFileWordCount(dt.ToString("dd MMM yyyy"), wordCount)
End If
dt = dt.AddDays(1)
Loop
Return totalWordCount
End Function Displaying the statistic information on the web page. The resource access statistic information can be displayed chronologically for each time interval. Such representation is helpful when you want to know the download statistics of the specified resource per each time interval (e.g, daily). The code below represents the modified version of the file enumerating algorithm from the previous chapter:
Protected
tblLogFileWordCount As System.Web.UI.WebControls.Table
Private Sub PrintLogFileWordCount(ByVal file As String, ByVal wordCount As Integer)
Dim row As TableRow = New TableRow()
tblLogFileWordCount.Rows.Add(row)
Dim cell As TableCell = New TableCell()
row.Cells.Add(cell)
cell.Width = Unit.Percentage(20)
cell.Text = String.Format("{0}:", Path.GetFileName(file))
cell = New TableCell()
row.Cells.Add(cell)
cell.Width = Unit.Percentage(80)
cell.Text = wordCount.ToString()
End Sub Multithreaded downloading statistic. Many users have special programs for downloading large files more effectively. Such programs (Download Managers) usually download one single web resource in multiple downloading threads simultaneously. Web server stores the corresponding log record per each downloading thread. In order to prevent our log parser from enumerating such duplicated log records we need to extract the user IP from each log record and check it for matching with all previously extracted IPs:
Private
ipList As Hashtable = New Hashtable()
Private Function IsNewIp(ByVal ipString As String) As Boolean
Dim result As Boolean = Not ipList.Contains(ipString)
If result AndAlso (Not ipString.Equals(String.Empty)) Then
ipList.Add(ipString, ipString)
End If
Return result
End Function
Private Function GetIp(ByVal line As String) As String
Dim ind As Integer = line.IndexOf(" ")
If ind > -1 Then
ind = line.IndexOf(" ", ind + 1)
End If
If ind > -1 Then
Dim indEnd As Integer = line.IndexOf(" ", ind + 1)
If indEnd > -1 Then
Return line.Substring(ind + 1, indEnd - ind - 1)
End If
End If
Return String.Empty
End Function Private
Function ProcessFile(ByVal fileName As String, ByVal checkWord As String) As Integer
Dim wordCount As Integer = 0
Dim fs As FileStream = New FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
Dim sr As StreamReader = New StreamReader(fs)
Dim s As String
s = sr.ReadLine()
While (s) <> Nothing
If s.ToUpper().IndexOf(checkWord.ToUpper()) > -1 Then
If IsNewIp(GetIp(s)) Then
wordCount += 1
End If
End If
Loop
sr.Close()
fs.Close()
Return wordCount
End Function This code is constantly being refined and improved and your comments and suggestions are always welcome. NOTE: THIS ARTICLE IS CONVERTED FROM C# TO VB.NET USING A CONVERSION TOOL. ORIGINAL ARTICLE CAN BE FOUND ON C# CORNER (WWW.C-SHARPCORNER.COM).