Prof. Powershell

Regular Expressions, Part 2

A few more tricks for working with regular expressions.

In our last lesson we started looking at the basics of using regular expressions in PowerShell. Let's see what else we can do.

In addition to character classes, we can also specify ranges by enclosing values in a set of brackets:

PS C:\> "Powershell" -match "[a-j]"
True
PS C:\> $matches

Name      Value
----      -----
0         e

We told PowerShell to match on any letter between a and j. This is case sensitive:

PS C:\> "Powershell" -match "[M-Z]"
True
PS C:\> $matches

Name      Value
----      -----
0         P

But we can also combine them:

PS C:\> "12Power$he||" -match "[a-zA-Z]"
True
PS C:\> $matches

Name      Value
----      -----
0         P

One thing you'll learn is that there is sometimes more than one way to write a pattern. But as long as you get a result that you expect and you can understand what you've written, I wouldn't worry about it. Remember those qualifiers?

PS C:\> '12Power$he||' -match "[a-zA-Z]+"
True
PS C:\> $matches

Name      Value
----      -----
0         Power

Here, we are matching on any multiple instances of an alphabetic character, regardless of case. Now for the part that causes a little head spinning at first -- I can also match on characters that do NOT match this pattern:

PS C:\> '12Power$he||' -match "[^a-zA-Z]+"
True
PS C:\> $matches

Name      Value
----      -----
0         12

Notice the carat inside the bracket. I had explained that ^ is used as an anchor but when used inside the square bracket, it means "find a non-match." This leads to another potential problem -- matching a special character like the $.

In a regular expression pattern, the $ is the last character anchor. But any special character can be escaped with the \, as in this example:

PS C:\> '12Power$he||' -match "\$\w+"
True
PS C:\> $matches

Name      Value
----      -----
0         $he

PowerShell didn't match on the vertical bars because those are special characters, which I'll demo in a second. Notice that my string is in single quotes. If I use "", PowerShell will see $he as a variable and try to expand it, which isn't what I want.

Other common characters you need to escape are the period,question mark, plus and, of course, the slash. So to see if something looks like an IP address you might use a pattern like this:

PS C:\> "172.16.10.12" -match "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
True
PS C:\> $matches

Name      Value
----      -----
0         172.16.10.12

The pattern is 1-3 digits separated by a period, which is escaped. This doesn't guarantee it is a valid IP address but it looks like one. It is possible to construct a pattern that includes some validation, but let's stick to simple things for now.

Or perhaps you want to validate a UNC:

PS C:\> "\\file01\public\foo" -match "^\\\\\w+\\\w+$"
False

This is where it gets tricky. Each slash in the UNC needs to be escaped. Plus I'm using the \w character class, which also has a slash. But even though it looks complicated, once you understand some basics, it isn't too hard to decipher. Did you notice I'm also using the start and end anchors? In this case, it fails. Can you tell what data the regular expression pattern is expecting?

PS C:\> "\\file01\public" -match "^\\\\\w+\\\w+$"
True

When developing code with regular expressions, it is important to test with values that you know should match and those that you expect to fail.

Before we wrap up today's lesson let's revisit the pipe character. Normally we think of the pipe character as a pipe character, but in a regular expression pattern it indicates OR:

PS C:\> "Powershell Rules" -match "rocks|rules|rulez"
True
PS C:\> $matches

Name      Value
----      -----
0         Rules

I'm going to match on any of the three words. But you know what? Let's simplify this a bit with what we learned earlier:

PS C:\> "Powershell Rules" -match "rocks|rule[sz]"
True
PS C:\> "Powershell Rule" -match "rocks|rule[sz]"
False

I want to match on either "rock" or "rule" followed by an a 's' or a 'z'. Here's a more practical example:

PS C:\> get-process | where {$_.company -match "^Microsoft|Google"} | select name,company

This will return all processes that start with either Microsoft or Google. To find the opposite I would probably use the -notmatch operator instead of futzing with the regular expression pattern.

PS C:\> get-process | where {$_.company -notmatch "^Microsoft|Google"} | select name,company

Next time, I'll be back to give you one more basic regular expression lesson.

About the Author

Jeffery Hicks is an IT veteran with over 25 years of experience, much of it spent as an IT infrastructure consultant specializing in Microsoft server technologies with an emphasis in automation and efficiency. He is a multi-year recipient of the Microsoft MVP Award in Windows PowerShell. He works today as an independent author, trainer and consultant. Jeff has written for numerous online sites and print publications, is a contributing editor at Petri.com, and a frequent speaker at technology conferences and user groups.

comments powered by Disqus
Most   Popular