PowerShell Pipeline
Find Data in Excel Using PowerShell
For those times when converting Excel files to .CSV just doesn't make sense.
Sometimes you need to scan some files for a piece of data like a string, phrase or some number, and one of those files just happens to be an Excel spreadsheet.
You could open up the file, launch the search window and begin looking at and taking note of every instance that you find. Or you could automate this using PowerShell by converting the file to a .CSV, then using Import-Csv to work through the data that way.
If neither of those approaches does what you want -- perhaps you don't want to convert the file to .CSV because there's more than one sheet in the file, or you just feel like converting it is too much of a manual step in your automation process -- then you are in luck! I will show you how to take a workbook in Excel and go through each spreadsheet in the file and search for data that not only shows you the data, but also the row and column number that it is located in.
First, I need an Excel spreadsheet with some data in it. Fortunately, I was able to download a sample spreadsheet here that has some data that I can use for my searches. The file is called "Sample - SuperStore.xls" and contains three sheets with various information in them, as shown in the images below.
Now that we know what we are working with, we can connect to the Excel COM object so that we can begin working with the object to search for some data.
$Excel = New-Object -ComObject Excel.Application
Now we will open up the Excel workbook by using the Workbooks.Open() method. Make sure that when you do this, you are providing the full path name for the file. Not doing this will result in the method throwing an error.
$Workbook = $Excel.Workbooks.Open('C:\users\proxb\Downloads\Sample - Superstore.xls')
I will now connect to the first spreadsheet and display its name so we can make sure we are looking at the Orders sheet.
$workSheet = $Workbook.Sheets.Item(1)
$WorkSheet.Name
When working with Excel, we use 1 to signify that we are on the first sheet, unlike arrays where they are zero-indexed.
We will begin searching for our data -- in this case, it is "Nebraska" -- using the Cells.Find() method and supplying the search string as the method parameter. If it does find data, it will be available to view as an object and also will be used as a parameter value in another method.
$Found = $WorkSheet.Cells.Find('Nebraska')
In this case, I found a cell that has "Nebraska" in it. Now we can take the object returned and use it to find the address of the data and save it as the beginning address to show the first item we found. At some point in our search, it will come back to the first match, so we want to stop when we get to that point.
$BeginAddress = $Found.Address(0,0,1,1)
$BeginAddress
I now have enough information to build a PowerShell object that contains the data found, as well as the column and row information and the address that I showed in the image above.
[pscustomobject]@{
WorkSheet = $Worksheet.Name
Column = $Found.Column
Row =$Found.Row
Text = $Found.Text
Address = $BeginAddress
}
From here, we want to continue the search until we reach the top of our matches.
Do {
$Found = $WorkSheet.Cells.FindNext($Found)
$Address = $Found.Address(0,0,1,1)
If ($Address -eq $BeginAddress) {
BREAK
}
[pscustomobject]@{
WorkSheet = $Worksheet.Name
Column = $Found.Column
Row =$Found.Row
Text = $Found.Text
Address = $Address
}
} Until ($False)
Here, you can see that I use the $found object from the initial search as the parameter value in the Cells.FindNext() method. I keep track of each address of the subsequent matches and as soon as I come across the first address, I break out of the Do loop so I do not have duplicate data showing up. Once this completes, I find a total of 38 matches on this particular sheet. Now I know how many times "Nebraska" shows up in the spreadsheet, and where.
I also want to add that you can use wild-card searches in these examples. Using an asterisk (*) will search for multiple characters, and using a question mark (?) will only assume a single character wild-card match to look for.
Something like this would make for a great function, right? Well, I agree, and wrote a small function that you can use to search a full workbook for some data and have it return the data I provided above, but also across all of the worksheets. The full code, called "Search-Excel," is available at the end of this article and takes parameters of the full file path and the search string.
Search-Excel -Source 'C:\users\proxb\Downloads\Sample - Superstore.xls' -SearchText Nebraska |
Format-Table
As you can see, it finds all of the "Nebraska" entries under the Orders sheet but doesn't find anything on the other two sheets.
You can use PowerShell to search for various pieces of data within an Excel workbook to include all of the worksheets, which can be useful to quickly determine how much of a particular piece of data is in the workbook. Or you can use it to quickly scan an Excel document for sensitive data. Either way, this will hopefully be a useful tool for your PowerShell toolbox!
Search-Excel Source Code
Function Search-Excel {
[cmdletbinding()]
Param (
[parameter(Mandatory)]
[ValidateScript({
Try {
If (Test-Path -Path $_) {$True}
Else {Throw "$($_) is not a valid path!"}
}
Catch {
Throw $_
}
})]
[string]$Source,
[parameter(Mandatory)]
[string]$SearchText
#You can specify wildcard characters (*, ?)
)
$Excel = New-Object -ComObject Excel.Application
Try {
$Source = Convert-Path $Source
}
Catch {
Write-Warning "Unable locate full path of $($Source)"
BREAK
}
$Workbook = $Excel.Workbooks.Open($Source)
ForEach ($Worksheet in @($Workbook.Sheets)) {
# Find Method https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-find-method-excel
$Found = $WorkSheet.Cells.Find($SearchText) #What
If ($Found) {
# Address Method https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-address-property-excel
$BeginAddress = $Found.Address(0,0,1,1)
#Initial Found Cell
[pscustomobject]@{
WorkSheet = $Worksheet.Name
Column = $Found.Column
Row =$Found.Row
Text = $Found.Text
Address = $BeginAddress
}
Do {
$Found = $WorkSheet.Cells.FindNext($Found)
$Address = $Found.Address(0,0,1,1)
If ($Address -eq $BeginAddress) {
BREAK
}
[pscustomobject]@{
WorkSheet = $Worksheet.Name
Column = $Found.Column
Row =$Found.Row
Text = $Found.Text
Address = $Address
}
} Until ($False)
}
Else {
Write-Warning "[$($WorkSheet.Name)] Nothing Found!"
}
}
$workbook.close($false)
[void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$excel)
[gc]::Collect()
[gc]::WaitForPendingFinalizers()
Remove-Variable excel -ErrorAction SilentlyContinue
}
About the Author
Boe Prox is a Microsoft MVP in Windows PowerShell and a Senior Windows System Administrator. He has worked in the IT field since 2003, and he supports a variety of different platforms. He is a contributing author in PowerShell Deep Dives with chapters about WSUS and TCP communication. He is a moderator on the Hey, Scripting Guy! forum, and he has been a judge for the Scripting Games. He has presented talks on the topics of WSUS and PowerShell as well as runspaces to PowerShell user groups. He is an Honorary Scripting Guy, and he has submitted a number of posts as a to Microsoft's Hey, Scripting Guy! He also has a number of open source projects available on Codeplex and GitHub. His personal blog is at http://learn-powershell.net.