In this post I’m continuing with the implementation of the Get-WebForBrokenLinks.

Get-WebForBrokenLinks -Web $subweb

Before we can have a look at finding all broken links within a site, we will need to identify where the broken links may be stored.  A quick look at SharePoint gives me the following lists

  • List items
  • Pages
  • Documents in libraries
  • Web Parts

For now I’m going to look at the easiest option. List items.

I’m going to start with the function. And I’m making the Lists available using the Load and ExecuteQuery:

Function Get-WebForBrokenLinks {
[CmdletBinding()] param( [Parameter(Mandatory=$True,ValueFromPipeline=$True, ValueFromPipelineByPropertyName=$True,HelpMessage='Web to be scanned for broken links')] [Microsoft.SharePoint.Client.Web] $Web )
begin{
Write-Host "Scanning: "  $Web.Url
}
process{
$web.Context.Load($web.Lists)
$web.Context.ExecuteQuery();
... # This is where the rest of the code needs to appear
}
end {
Write-Host "Compelted scanning: "  $Web.Url
}
}

Now I need to go through the lists and the list items

ForEach ($list in $web.Lists) {
$items = Get-PnPListItem -List $list
foreach ($item in $items) {
....
}
}

So now I’m getting the items for all of my lists. Now it becomes important to understand what type of fields SharePoint has as we step through all the fields in all the items of all the lists.

foreach ($fieldValue in $item.FieldValues){

foreach ($value in $fieldValue.Values) {
if ($value -ne $null) {
switch ($value.GetType().Name){
....
}
}
}

Now all we need to do is handle all data types that may contain urls. So what are the data types? And which ones could possibly contain a url?

To find this out I added a default option to my switch:

default {
$type = $value.GetType()
Write-Error "Not supported type: $type"
}

Then I kept rerunning my script until I collected all the datatypes. I found the following data types in my lists:

  • Guid
  • Int32
  • ContentTypeId
  • DateTime
  • FieldUserValue
  • FieldLookupValue
  • Boolean
  • Double
  • String[]
  • FieldUrlValue
  • String

Most of these couldn’t possibly contain a url. e.g. Guid. So building up my switch I get the following script:

switch ($value.GetType().Name){
"Guid" { # Ignore }
"Int32" { # Ignore }
"ContentTypeId" { # Ignore }
"DateTime" { # Ignore }
"FieldUserValue" { # Ignore }
"FieldLookupValue" { # Ignore }
"Boolean" { # Ignore }
"Double" { # Ignore }
"String[]" { ...
}
"FieldUrlValue" { ...
}
"String" { ...
}
default {
$type = $value.GetType()
Write-Error "Not supported type: $type"
}
}

Ok, so so far I only need to write some code for 3 field types. I’m going to start with FieldUrlValue. The reason why this type is easier than String is because the String field may contain other text as well:

if ($value.Url.Contains("https://") -or $value.Url.Contains("http://") ) {
try {
if ((invoke-webrequest $value -DisableKeepAlive -UseBasicParsing -Method head).StatusCode -ne 200){
Write-Host "Broken link:" $value.Url
}
}
catch
{
Write-Host "Broken link:" $value.Url
}
}

So we are now ready to answer the next critical question. How do I recognize a Url in text. I’ve seen solution with Regular expressions. And although this might be a good way ( if you can get it to work!) I’m hoping that I have found an easier way.

It’s all started by assuming that a Url doesn’t contain a space. So if I have a text with a url then a split by space would give me an array:

$string = "text https://sharepains.com/anylocation/anypage.html some more text"
$string.split(" ")

Ok, This will almost work, but not if there isn’t a space before or after the url. So other than spaces what else could be splitting urls from text.
I’m first having a look at the html

<a href="http://testurl">Link</a>

As all I’m interested in is getting a variable with a clean url in it, I could just split by ” as well.

for string fields this results in the following piece of code:

if ($value.Contains("https://") -or $value.Contains("http://") -or $value.Contains("http://") -or $value.Contains("https://") ) {
try {
$words = $value.split(" ")
foreach ($word in $words) {
$quotesplitwords = $word.split("`"")
foreach ($quotesplitword in $quotesplitwords)
{
if ($quotesplitword.Contains("https://") -or $quotesplitword.Contains("http://") -or $quotesplitword.Contains("http://") -or $quotesplitword.Contains("https://") ) {
if ((invoke-webrequest $quotesplitword.Replace(":", ":") -DisableKeepAlive -UseBasicParsing -Method head).StatusCode -ne 200){
Write-Host "Broken link:" $quotesplitword
}
}
}
}
}
catch
{
Write-Host "Broken link:" $quotesplitword
}
}

This code now only gives one false positive:

Office 365 - Check your site for broken links in SharePoint Online - Part 2 Microsoft Office 365, Microsoft SharePoint Online brokenlinks

If urls appear in text, without these being actual clickable hyperlinks then the script will flag them up. Actually any text that contains http will be flagged up as a broken link. Well for now I’m going to decide to live with that. Not sure though if this will be ok for the leftover locations that may contain broken links.

So this now covers finding broken urls within list items. there is still quite a bit of work to do.

  • Pages
  • Documents in libraries
  • Web Parts

But these elements will be done within the next part of this series. Now that we have code that finds Urls within text we are half way there.

Avatar for Pieter Veenstra

By Pieter Veenstra

Business Applications Microsoft MVP working as the Head of Power Platform at Vantage 365. You can contact me using contact@sharepains.com

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”
    1. Sorry, a had a client that needed this a few years ago. But we never got to fully implement the whole solution. If I get a client whobdoes want the rest as well then I might complete this series.

  1. Using Invoke-Webrequest, I am able to find broken SP sites but for site pages it is still returning me 200 status for broken pages , Any suggestions?

  2. I guess there was no part 3? This definitely got me started, and I got it all sorted out for the most part now, but I found that the lists were harder to filter through than regular pages. Especially fields that were multi line or allowed for the special features. No matter twhat you do, the text comes out with multipl lines and won’t filter properly. I ended up making a scratch file to dump the text to and then to do a get-content … – raw.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from SharePains by Microsoft MVP Pieter Veenstra

Subscribe now to keep reading and get access to the full archive.

Continue reading