Home Scripting Games Scripting Games Task #5 Commentary
 

Scripting Games Task #5 Commentary

In todays challenge, you were asked to scan multiple IIS log files for IPv4 addresses and create a list with unique IP addresses.

Rather than going through all the sophisticated entries you submitted, I’d like to focus on the beef and present the core part of a sample solution that does the trick.

Scanning Multiple LogFiles For IP Addresses

Here is the core sample code that solves the puzzle:

$regex_IP = '(?:80 - )((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))'

$path = 'C:\Users\Tobias\Downloads\LogFiles\LogFiles\*\*'

Get-ChildItem $path | ForEach-Object {
    foreach($line in (Get-Content -Path $_.FullName -ReadCount 0))
    {

        if ($line -match $regex_IP) { $matches[1] }
    }
} | Sort-Object { [System.Version]$_ } -Unique

Let’s take a look at the code and how it accomplishes the job.

Scanning Multiple Files

All log files are located in the folder “LogFiles”. However, each logfile is then again located in its own subfolder. The easiest way of getting to the logfiles is to use multiple wildcards. That’s why the code uses this path:

$path = 'C:\Users\Tobias\Downloads\LogFiles\LogFiles\*\*'

Get-ChildItem will now retrieve all content located in the second level subfolder under LogFiles. You could further refine this by adding a file extension to the second wildcard in case there are other files located in the subfolders as well.

Finding Text Lines With IPv4 Addresses

To scan each logfile for lines that contain IPv4 addresses, the most efficient approach is using a regular expression pattern.

However, when you look at the logfile content, you will notice that there are two IP addresses. The client IP address is the second one. To select it, the regular expression adds some static text (an anchor) that is always found prefixed to each client address.

$regex_IP = '(?:80 - )((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))'

Notice the use of parenthesis and “?:”. In a regular expression, each parenthesis is a subexpression, and when you start it with “?:”, then this subexpression is not considered to be something you are interested in. So the only parenthesis that has no “?:” is the IP address we are looking for.

To extract the IP addresses from each line, you use the operator -match. It takes the regular expression and returns $true if the line matches the pattern.

So the code uses an If condition, and if -match returns $true, the resulting IP address can be found in the automatic variable $matches[1]. $matches holds all results, and $matches[0] is always the entire pattern. $matches[1] is the first subexpression (the first parenthesis without “?:”), so this will return the IP address.

Increasing Performance

Note the foreach statement in the code: the code does not use the pipeline (Foreach-Object) but instead uses a classic foreach construct to scan the log file content.

It also uses Get-Content with the parameter -ReadCount 0, which causes the cmdlet to return a string array in one chunk. This increases performance by approximately 6000%, and scanning large log files will take seconds, not minutes.

Excluding Duplicate IP Addresses

Since log files may contain duplicate IP addresses, excluding duplicate entries is done by using Sort-Object -Unique. You could use it as-is, and this would result in an IP list that is alphabetically sorted. It would still find and eliminate duplicates.

To get a correctly sorted result (which was not a requirement), you can submit a scriptblock to Sort-Object and convert the IPv4 addresses to System.Version. Version numbers consist of four numbers, just like IPv4 addresses, so this type can be used for correct IPv4 sorting.

...| Sort-Object { [System.Version]$_ } -Unique

I hope this provided some useful tips and techniques.

See you next time!

Tobias

P.S.

Ah, and if you or your company are thinking about a professional PS workshop, contact me! I do lots of PowerShell trainings for beginners and advanced users throughout Europe and would be happy to assist you…!

We do trainings in German, English, French, Italian, and Spanish.

You can reach me here: tobias.weltner(AT)email.de.