PowerShell How-To

Filtering Command Output in PowerShell

In which Adam demonstrates the truth in the old PowerShell adage: "The more you can limit the number of objects returned to the pipeline, the faster you code will run."

Lots of commands will return objects that aren't always exactly what you'd like. Get-ChildItem can return a list of files on a storage volume but it's not realistic to enumerate the entire volume just to find one file. For that matter, you wouldn't just type Get-Vm and go through the hundreds of virtual machines you've got just to see what's happening on a single one. We need to limit that output somehow.

To do that, we have a few options in PowerShell. There's a saying in the PowerShell world to "filter left." This means it's best practices to limit the number of objects that are returned from commands as close to the source of output as possible. Generally, the closer you can get to limiting the number of objects returned to the pipeline, the faster you code will run.

Lots of methods exist to put this into practice, but a couple of popular ones are deciding whether to use the Filter parameter that's on the Get-ChildItem, Get-Ad* commands and many others, or using the more generic Where-Object command. Each will do the job of filtering output, but the difference in performance and memory consumption can be great.

For one example, PowerShell has a concept of providers. Each of these providers has its own built-in filtering system that PowerShell exposes via the Filter parameter. It's generally better to use the Filter parameter than Where-Object because the Filter parameter passes instructions to .NET to limit output at the provider level, rather than having to pull all of those objects out and then filtering the output at the pipeline level. The more you can avoid the pipeline, the faster your code will run.

To demonstrate this concept, let's look at a couple of different ways of filtering files in a folder -- although this technique could apply across a number of different scenarios.

I have a folder with 10,000 text files in it, with each file name incrementing by one: 1.txt, 2.txt, 3.txt, et cetera.

PS> (Get-ChildItem -Path C:\testing\).Count
10000

Let's say I want to find all of the files that have a "1" in the name. We need to filter the total results somehow. One way to do that is by using the Filter parameter on Get-ChildItem. This allows the user to specify what kind of files should be returned at the file system level. When using Filter, you must adhere to a specific syntax. For this example, to find all files with a "1" in them, I can do something like this:

PS> Get-ChildItem -Path C:\testing\ -Filter '*1*.txt'

On my computer, this takes about 145 milliseconds.

PS> Measure-Command { Get-ChildItem -Path C:\testing\ -Filter '*1*.txt' }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 142
Ticks             : 1429123
TotalDays         : 1.6540775462963E-06
TotalHours        : 3.96978611111111E-05
TotalMinutes      : 0.00238187166666667
TotalSeconds      : 0.1429123
TotalMilliseconds : 142.9123

Let's now use the more generic Where-Object command, which forces Get-ChildItem to enumerate all of the files on the file system, pass to PowerShell and then, at the pipeline, filter the results. Notice the time difference.

PS> Measure-Command { Get-ChildItem -Path C:\testing\ | Where-Object { $_.Name -like '*1*.txt' } }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 619
Ticks             : 6197156
TotalDays         : 7.17263425925926E-06
TotalHours        : 0.000172143222222222
TotalMinutes      : 0.0103285933333333
TotalSeconds      : 0.6197156
TotalMilliseconds : 619.7156

For the exact same result, we've increased the time 4x! I think I'll stick with using Filter.

We can even use the faster where() method and it's still substantially slower.

PS> Measure-Command { (Get-ChildItem -Path C:\testing\).where{( $_.Name -like '*1*.txt' )}}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 429
Ticks             : 4296713
TotalDays         : 4.9730474537037E-06
TotalHours        : 0.000119353138888889
TotalMinutes      : 0.00716118833333333
TotalSeconds      : 0.4296713
TotalMilliseconds : 429.6713

Use Filter whenever possible. If the command doesn't have a Filter parameter, look through the commands parameter to ensure it does not have another kind of filtering mechanism. The Where-Object command and where() method are universal and can be applied to any object being returned by any command, but that universality comes at a performance cost.

About the Author

Adam Bertram is a 20-year veteran of IT. He's an automation engineer, blogger, consultant, freelance writer, Pluralsight course author and content marketing advisor to multiple technology companies. Adam also founded the popular TechSnips e-learning platform. He mainly focuses on DevOps, system management and automation technologies, as well as various cloud platforms mostly in the Microsoft space. He is a Microsoft Cloud and Datacenter Management MVP who absorbs knowledge from the IT field and explains it in an easy-to-understand fashion. Catch up on Adam's articles at adamtheautomator.com, connect on LinkedIn or follow him on Twitter at @adbertram or the TechSnips Twitter account @techsnips_io.


comments powered by Disqus
Most   Popular