Prof. Powershell

PowerShell 3 Web of Wow, Part 1

PowerShell has cmdlets for working with Web resources, including those up in the cloud. Let's take a look in this first of a three-part series.

When PowerShell 3 came out along with Windows Server 2012 and Windows 8, you probably heard a lot of talk about the cloud and how to manage within it. In this week's lesson we'll start a short series on some new PowerShell 3 cmdlets that make it very easy to work with services and data delivered from the Internet. Actually, even though I'm going to use publicly available Internet resources, the cmdlets show work with any Web-like data, even if it comes from within your intranet.

In PowerShell 2 if we wanted to retrieve data from a Web site, we had to turn to the .NET Framework and build wrapper functions around the WebClient class. But now in version 3 we have a cmdlet which makes this much, much easier. At its simplest all you need to do is provide a URL to Invoke-WebRequest:

PS C:\> Invoke-WebRequest http://jdhitsolutions.com/blog

You might be prompted to complete the request. But you get back a nicely packaged object (see Fig. 1).

Invoke-WebRequest cmdlet

Figure 1. Provide a URL to Invoke-WebRequest and get back a nicely packaged object. (Click image to view larger version.)

To make this easier to work with, I'll re-run the command and save the results to a variable.

PS C:\> $blog = Invoke-WebRequest http://jdhitsolutions.com/blog

You can also redirect output to a text file right from the cmdlet:

PS C:\> Invoke-WebRequest http://jdhitsolutions.com/blog -OutFile c:\work\jdhitblog.txt

You don't get the formatted object in the text file, you get the complete HTML output. But using the formatted object saved to a variable offers a number of advantages. First, you can use the Status property to verify the Web site:

PS C:\> $blog.StatusCode
200

Want to find all of the links on the page? No need to try to parse the HTML. Use the PowerShell object:

PS C:\> $blog.links[0]

innerHTML : The Lonely Administrator
innerText : The Lonely Administrator
outerHTML : <A title="The Lonely Administrator"
            href="http://jdhitsolutions.com/blog/"
            rel=home>The Lonely Administrator</A>
outerText : The Lonely Administrator
tagName   : A
title     : The Lonely Administrator
href      : http://jdhitsolutions.com/blog/
rel       : home

Depending on the source you might be able to take it further. Since this is the page from my blog, there are permalink entries which I can discover by browsing through all the links. Once I know what to look for, I can filter for them and display the relevant information:

PS C:\> $blog.links | where title -match "permalink" | Select @{Name="Article";Expression={$_.InnerText}},@{Name="Link";Expression={$_.href}} | format-list

The result is in Figure 2.

Filtered results

Figure 2. Filtering the results. (Click image to view larger version.)

The other nice benefit with Invoke-WebRequest is that the HTML is also parsed for you and stored as an object property:

PS C:\> PS C:\> $wr = Invoke-WebRequest $url
PS C:\> $wr.ParsedHtml

What this means for you, if you have experience using the Document Object Model (see Fig. 3), is that it is much easier to "screen-scrape."

easier screen scrapes

Figure 3. Screen scraping just got easier with PowerShell 3. (Click image to view larger version.)

PS C:\> $h = $wr.ParsedHtml.getElementsByTagName("H3")
PS C:\> $h | where classname -eq 'title' | select InnerText

innerText
---------
Get Your PowerShell Object Properties In Order
A Better View of PowerShell Help
HTML Bits and Pieces, Part 3
HTML Bits and Pieces, Part 2
HTML Bits and Pieces, Part 1
Pushing the ENV:
Organize Your Scripts with PowerShell ISE 3
What's In Your Pipeline?
For PowerShell Help, It Takes a Community
Working with Values and Variables in PowerShell
I Can Type That Command In 10 Characters
Math Matters
Certificate Certainty
Regular Expressions, Part 3
Regular Expressions, Part 2

That's just a simple example merely to demonstrate that parsing the document object is pretty simple. Invoke-WebRequest has a number of additional features so be sure to look at full help and examples. Next time we'll look at an even easier way to consume Web data.

Important Note: There have been reports of severe memory leaks and problems when using Invoke-WebRequest in the Powershell ISE. I strongly recommend only using this cmdlet from the PowerShell console.

More on this topic:

About the Author

Jeffery Hicks is an IT veteran with over 25 years of experience, much of it spent as an IT infrastructure consultant specializing in Microsoft server technologies with an emphasis in automation and efficiency. He is a multi-year recipient of the Microsoft MVP Award in Windows PowerShell. He works today as an independent author, trainer and consultant. Jeff has written for numerous online sites and print publications, is a contributing editor at Petri.com, and a frequent speaker at technology conferences and user groups.

comments powered by Disqus
Most   Popular