Office 365 - Check your site for broken links in SharePoint Online

In part 1 of this series Check your site for broken links in SharePoint Online, I looked at going through all my sites within a site collection.

In this post I’m continuing with the implementation of the Get-WebForBrokenLinks.

Get-WebForBrokenLinks -Web $subweb

Before we can have a look at finding all broken links within a site, we will need to identify where the broken links may be stored. A quick look at SharePoint gives me the following lists

List items
Pages
Documents in libraries
Web Parts

For now I’m going to look at the easiest option. List items.

I’m going to start with the function. And I’m making the Lists available using the Load and ExecuteQuery:

Function Get-WebForBrokenLinks {
[CmdletBinding()] param( [Parameter(Mandatory=$True,ValueFromPipeline=$True, ValueFromPipelineByPropertyName=$True,HelpMessage='Web to be scanned for broken links')] [Microsoft.SharePoint.Client.Web] $Web )
begin{
Write-Host "Scanning: "  $Web.Url
}
process{
$web.Context.Load($web.Lists)
$web.Context.ExecuteQuery();
... # This is where the rest of the code needs to appear
}
end {
Write-Host "Compelted scanning: "  $Web.Url
}
}

Now I need to go through the lists and the list items

ForEach ($list in $web.Lists) {
$items = Get-PnPListItem -List $list
foreach ($item in $items) {
....
}
}

So now I’m getting the items for all of my lists. Now it becomes important to understand what type of fields SharePoint has as we step through all the fields in all the items of all the lists.

foreach ($fieldValue in $item.FieldValues){

foreach ($value in $fieldValue.Values) {
if ($value -ne $null) {
switch ($value.GetType().Name){
....
}
}
}

Now all we need to do is handle all data types that may contain urls. So what are the data types? And which ones could possibly contain a url?

To find this out I added a default option to my switch:

default {
$type = $value.GetType()
Write-Error "Not supported type: $type"
}

Then I kept rerunning my script until I collected all the datatypes. I found the following data types in my lists:

Guid
Int32
ContentTypeId
DateTime
FieldUserValue
FieldLookupValue
Boolean
Double
String[]
FieldUrlValue
String

Most of these couldn’t possibly contain a url. e.g. Guid. So building up my switch I get the following script:

switch ($value.GetType().Name){
"Guid" { # Ignore }
"Int32" { # Ignore }
"ContentTypeId" { # Ignore }
"DateTime" { # Ignore }
"FieldUserValue" { # Ignore }
"FieldLookupValue" { # Ignore }
"Boolean" { # Ignore }
"Double" { # Ignore }
"String[]" { ...
}
"FieldUrlValue" { ...
}
"String" { ...
}
default {
$type = $value.GetType()
Write-Error "Not supported type: $type"
}
}

Ok, so so far I only need to write some code for 3 field types. I’m going to start with FieldUrlValue. The reason why this type is easier than String is because the String field may contain other text as well:

if ($value.Url.Contains("https://") -or $value.Url.Contains("http://") ) {
try {
if ((invoke-webrequest $value -DisableKeepAlive -UseBasicParsing -Method head).StatusCode -ne 200){
Write-Host "Broken link:" $value.Url
}
}
catch
{
Write-Host "Broken link:" $value.Url
}
}

So we are now ready to answer the next critical question. How do I recognize a Url in text. I’ve seen solution with Regular expressions. And although this might be a good way ( if you can get it to work!) I’m hoping that I have found an easier way.

It’s all started by assuming that a Url doesn’t contain a space. So if I have a text with a url then a split by space would give me an array:

$string = "text https://sharepains.com/anylocation/anypage.html some more text"
$string.split(" ")

Ok, This will almost work, but not if there isn’t a space before or after the url. So other than spaces what else could be splitting urls from text.
I’m first having a look at the html

&lt;a href="http://testurl"&gt;Link&lt;/a&gt;

As all I’m interested in is getting a variable with a clean url in it, I could just split by ” as well.

for string fields this results in the following piece of code:

if ($value.Contains("https://") -or $value.Contains("http://") -or $value.Contains("http://") -or $value.Contains("https://") ) {
try {
$words = $value.split(" ")
foreach ($word in $words) {
$quotesplitwords = $word.split("`"")
foreach ($quotesplitword in $quotesplitwords)
{
if ($quotesplitword.Contains("https://") -or $quotesplitword.Contains("http://") -or $quotesplitword.Contains("http://") -or $quotesplitword.Contains("https://") ) {
if ((invoke-webrequest $quotesplitword.Replace(":", ":") -DisableKeepAlive -UseBasicParsing -Method head).StatusCode -ne 200){
Write-Host "Broken link:" $quotesplitword
}
}
}
}
}
catch
{
Write-Host "Broken link:" $quotesplitword
}
}

This code now only gives one false positive:

Office 365 - Check your site for broken links in SharePoint Online - Part 2 Microsoft Office 365, Microsoft SharePoint Online brokenlinks

If urls appear in text, without these being actual clickable hyperlinks then the script will flag them up. Actually any text that contains http will be flagged up as a broken link. Well for now I’m going to decide to live with that. Not sure though if this will be ok for the leftover locations that may contain broken links.

So this now covers finding broken urls within list items. there is still quite a bit of work to do.

Pages
Documents in libraries
Web Parts

But these elements will be done within the next part of this series. Now that we have code that finds Urls within text we are half way there.

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”

tombraman says:
November 30, 2020 at 9:00 pm
Can’t wait for part III!
Loading...
1. Pieter Veenstra says:
  November 30, 2020 at 9:05 pm
  Sorry, a had a client that needed this a few years ago. But we never got to fully implement the whole solution. If I get a client whobdoes want the rest as well then I might complete this series.
  Loading...
Sushma Yadav says:
May 3, 2021 at 6:28 am
Using Invoke-Webrequest, I am able to find broken SP sites but for site pages it is still returning me 200 status for broken pages , Any suggestions?
Loading...
1. Pieter Veenstra says:
  May 3, 2021 at 8:49 am
  You could try and read the page and look at the content. The non existing page will not have any content.
  Loading...
thad131 says:
December 3, 2021 at 9:53 pm
I guess there was no part 3? This definitely got me started, and I got it all sorted out for the most part now, but I found that the lists were harder to filter through than regular pages. Especially fields that were multi line or allowed for the special features. No matter twhat you do, the text comes out with multipl lines and won’t filter properly. I ended up making a scratch file to dump the text to and then to do a get-content … – raw.
Loading...
1. Pieter Veenstra says:
  December 3, 2021 at 9:59 pm
  Indeed, I never got to finishing this series as the client didn’t want this to be done.
  Loading...
AFriend says:
February 14, 2024 at 1:31 pm
What does the completed script look like?
Loading...
1. Pieter Veenstra says:
  February 14, 2024 at 2:20 pm
  I would need to reconstruct it from the post.
  Loading...
  Reply

Office 365 – Check your site for broken links in SharePoint Online – Part 2

ByPieter Veenstra

Like this:

Related

By Pieter Veenstra

Related Post

2 ways to duplicate SharePoint Lists to support your Power Apps

Update a Hyperlink Column in SharePoint with Power Automate

Replace Edit with Contribute permissions with SharePoint Site Templates

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”

Leave a ReplyCancel reply

You missed

Copy and paste Scope steps in the new Power Automate Designer

Receive the available storage within your SharePoint Online tenant

Options for Documenting Your Power Apps: Comments, Code, and Controls

2 ways to duplicate SharePoint Lists to support your Power Apps

ByPieter Veenstra

Share this:

Like this:

Related

By Pieter Veenstra

Related Post

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”

Leave a ReplyCancel reply

You missed

Discover more from SharePains by Microsoft MVP Pieter Veenstra