It has been very fascinating to open and download the files from webpages from the command line. For linux users, the wget command is the heavenly gift. When it comes to windows, it becomes little hectic. Using cmd one can open the webpages but playing with the data is little time consuming. So I am writing this article about how to open and read webpages from Powershell.
Steps that are included in this process are :
- Open the webpage
- Extract HTML Title, Description, Keywords
- Avoid URLs Matching Any of a Set of Patterns
- Setting a Maximum Response Size
- Setting a Maximum URL Length
- Using the Disk Cache
- Crawling the Web
- Get Referenced Domains
- GetBaseDomain
- Must-Match Patterns
Now lets start with the commands
Start --> Fast Search Server 2010 for SharePoint (right click --> Run as Administrator)
The Short Version
Add the SharePoint PowerShell cmdlets
Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
Create and configure the Content Source (enter a URL that doesn't mind you crawling it. Perhaps your blog page?)
$contentSSA = "FASTContent"
$startaddress = [enter a URL here]
$contentsourcename = "Web site crawl"
$contentsource = New-SPEnterpriseSearchCrawlContentSource -SearchApplication $contentSSA -Type Web -name $contentsourcename -StartAddresses $startaddress -MaxSiteEnumerationDepth 0
Start the crawl
$contentsource.StartFullCrawl()
$contentsource.CrawlStatus
Keep executing $contentsource.CrawlStatus until the status changes to CrawlCompleting and then Idle
Execute a search
The Long Version
Again, there really isn't any reason to go over all the steps as they don't really change from step to step. So let's clarify few things.
$contentsource = New-SPEnterpriseSearchCrawlContentSource -SearchApplication $contentSSA -Type Web -name $contentsourcename -StartAddresses $startaddress -MaxSiteEnumerationDepth 0
It is interesting to note that the New-SPEnterpriseSearchCrawlContentSource cmdlet defaults to the Custom crawl rule which will read all pages and all links found at the starting URL. We set MaxSiteEnumerationDepth to zero which causes the crawler to read the content at the site we started at rather than allowing the crawler to go into ADD mode becoming easily distracted and chasing down every car that goes by.
Another method :
(New-Object System.Net.WebClient).DownloadFile($url, $localFileName)
In v3, the Invoke-WebResquest cmdlet:
Invoke-WebRequest -Uri $url -OutFile $localFileName
Another option is with the Start-BitsTransfer cmdlet:
Start-BitsTransfer -Source $source -Destination $destination
There are at least (not 2) 4 ways to open web address URL with default browser in Powershell.
1. Run a exe file with parameter is our url.
How to get exe filepath of default browser?
Function GET-DefaultBrowserPath {
#Get the default Browser path
New-PSDrive -Name HKCR -PSProvider registry -Root Hkey_Classes_Root | Out-Null
$browserPath = ((Get-ItemProperty ‘HKCR:\http\shell\open\command’).'(default)’).Split(‘”‘)[1]
return $browserPath
}
call
Get-DefaultBrowserPath
Simplest way:
just type start ‘http://www.gurucore.com’ in Powershell or cmd.
