PowerShell Pipeline

Using Regular Expressions with PowerShell To Locate Data

Speed up information retrieval with these expressions.

Whether you are parsing logs or validating input, using regular expressions is a great way to accomplish both of these things. The problem is: regular expressions can be hard and that will only grow as more complex regular expressions are needed for various things that you need to accomplish. There is also a reason why using regular expressions can sometimes be called "prayer-based parsing" because, if not used properly, you could inadvertently allow the wrong data if your RegEx pattern isn't tighten down.

Ok, so now that I have scared you away from using regular expressions, I am going to reel you back in with some examples that show some simple things that you can do using regular expressions. First we should look at some of the basic patterns and what they mean.

\w

Matches a letter or number

\d

Matches a digit

[a-z]

Matches a range of lower case letters from a-z

.

Matches a single character

^

Matches starting point of a string

$

Matches the ending point of a string

Value

Matches the exact value given

+

One or more matches

{2}

Matches specified number of items

That was just a quick look at a handful of the possible patterns that you can use. With PowerShell, there are a few ways to perform a match. You can use –Match and –NotMatch to look at single strings or you can use Select-String to look at entire files or even a single string. Depending on how you run these commands, you might just get back the result of the match which would just be a single result, or you might get back many results that meet the patterns that you supply.

Let's start off with a simple match of a value:

'test' -match  'test' 

The return value that is returned is a Boolean value of True. That by itself might be enough to know what it is a match, but what else happens is that the automatic variable, $Matches is also populated with the match data in a hash table.

Figure 1. A simple RegEx match.

If the result was $False, then the $Matches variable would not be updated. This is important to remember because if you had a match before, the previous value of that match would still remain in the variable so it is not a good idea to test on the contents of $Matches and instead rely on the Boolean value for comparing.

Let's run through some more examples just to better understand how some of the patterns work:

 

#Only matches letters 
'test' -match '[A-Za-z]+'
$Matches

#Matching part of a string
'test' -match '[A-Za-z]{2}'
$Matches

#Match first 2 digits
1234 -match '\d{2}'
$Matches

#Match digits and letters
'test1234' -match '\w+'
$Matches

[Click on image for larger view.] Figure 2. Various RegEx match examples.

As you can see, we can match a wide variety of things and you could then take the results of the matches and use that for another file or to group the data if needed.

Ok, so that has been really just some simple matching to get an idea on how we can use regular expressions to detect if the input data matches our patterns, but let's do something a little more closer to a real world example such as finding an IP address within some text. Because this isn't really validation of an IP and more of searching for an IP, I am going to use a simpler RegEx pattern to sift through the text and grab the IP address: (?:\d{1,3}\.){3}\d{1,3}

So what does this actually do? I'll try to break this down more to give you a better idea on what is actually happening.


(

Begins a group capture of a regular expression pattern

?:

Tells the RegEx engine to not capture anything in the () as a group

\d

Capture a digit only

{1,3}

We are looking for 1 to 3 digits

\.

Use the backslash to escape the period so it is treated literally

)

Close the group capture of 1 to 3 digits followed by a period

{3}

We are looking for exactly 3 instances of our group capture of 1 to 3 digits followed by a period

\d

We are looking for another digit

{1,3}

And lastly, there should be 1 to 3 digits that are being looked for

In the end we can run the following line and see that we have located the IP address.

#This is a simple RegEx for IP and should not be used for  validation, only searching

'The IP address is 192.168.1.58 on computer SERVER1' -match '(?:\d{1,3}\.){3}\d{1,3}'
$Matches

[Click on image for larger view.] Figure 3. Matching an IP Address.

 

Pretty cool, right? By the way, if you want to see a more complex RegEx pattern that would be used to validate an IP address (because hosting an IP of 450.9.569.2 is sort of out of the range of a valid IP) you would use something like this:

^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$ 

There is a lot happening here, but if you were to test this against a valid IP and one that isn't valid, it should work exactly as what you would expect.

I'm going to wrap this up with a look at using Select-String to parse out log files for IP addresses. This cmdlet works just like using the –Match operator except that the $Matches variable is not used. In fact, you will get all of the matches when you run the cmdlet but in a way that you might not believe as show below.

'The IP address is 192.168.1.58 on computer SERVER1'  | Select-String  -Pattern '(?:\d{1,3}\.){3}\d{1,3}' 
[Click on image for larger view.] Figure 4. Results returned when using Select-String.

This isn't exactly the most helpful thing if you just want the value (this is actually useful when working with logs or other files, but more on that in a minute).

What you are getting back is a Microsoft.PowerShell.Commands.MatchInfo object which contains some properties which makes getting the value of the match easier to locate.

[Click on image for larger view.] Figure 5. A look at the returned object from Select-String.

Now let's just get the IP address.

'The IP address is 192.168.1.58 on computer SERVER1'  | Select-String  -Pattern '(?:\d{1,3}\.){3}\d{1,3}').Matches.Value

Now we have our lone IP address that was matched. Imagine that we have a log file with a bunch of IP addresses in it and we need to know all of the IPs as well as a count for each unique IP address that is in the log. Using Select-String is an excellent way to gather this information.

(Select-String -Path .\SomeLog.txt -Pattern  '(?:\d{1,3}\.){3}\d{1,3}').Matches.Value  | 
Group-Object | Select-Object -Property Count, Name
Figure 6. Finding all IP addresses in a log file.

In this case we only had three unique IP Addresses and can tell how many times that they appear in the log.

As we have seen, using regular expressions can be a very useful tool to parse log files and pull information that can be used in a variety of ways. While we only covered the basics, hopefully this will help you out in the future with your log parsing needs!

comments powered by Disqus
Most   Popular

Upcoming Training Events