Checking for broken links

Problem

Website owners are encouraged to regularly scan their websites for broken links. A broken link is a frustrating dead end for users.


Most 404 pages aren't very friendly. Is yours?

Links can break because the destination website deletes a page, goes offline, or changes its URL structure. In each case it is unlikely that you will be notified that this is happening, so you need to check your own website regularly.

Solution

The Central Web Unit recommends Xenu's Link Sleuth (XLS), a free application that can automatically check an entire website's links using a process called spidering. Given an initial page, XLS finds all the links on that page, checks if any are broken, then repeats the process on each working link. XLS checks links to external websites, but doesn't follow them any deeper.

We use XLS to check the UNSW corporate website on a weekly basis, and other websites less frequently. Your needs may vary. We also advise running XLS on new websites prior to launch, and immediately after.

CWU offers assistance with XLS but does not have the resources to provide training or conduct broken link checks on your behalf.

IMPORTANT: Do not use XLS while logged into a website as an administrator! Since XLS does its job by "clicking" every link, it will also click administrator links that may modify or delete content from your website. Make sure you are logged out of your website from each browser installed on your system.

Mac alternative

Integrity is a free OSX application, however the Central Web Unit has not yet field-tested it and cannot provide instructions at this time. (Please let us know if you have a positive or negative experience with it.)

How to use XLS for common tasks

Check an entire site for broken links

  • From the File menu, select "Check URL"
  • Specify the initial URL to check (e.g. your homepage)
  • Tick "Check external links" if you want to test whether links to other websites are working
  • When the check is completed, generate a report and look for "Broken links, ordered by link" or "Broken links, ordered by page" (different ways of ordering the same information)
  • Alternately, right-click a red URL and select "URL Properties". The URL Properties window shows which pages link to that broken URL.
  • You can filter your results to only show broken URLs by pressing control-B, or from the View menu.

Exclude parts of a site from a check

You may want to exclude database or archival material if there is a lot of it, or if it does not need frequent checking.

Generate a list of broken links for Excel

This will create a file records every link on every page found by XLS. This allows you to find every page on your site that links to a particular URL (for instance, a page which is being deleted and needs to be de-linked from everywhere).

  • After a check has finished, select "File: Export Page Map to TAB separated file..."
  • Name and save the file
  • Open the file using Excel or similar program
  • OriginPage is the page that was checked. LinkToPage is the link found on the origin page. LinkToPageStatus records whether a page is OK, not found, password protected, skipped etc.

Check a password-protected site

This can be done by ticking "Ask for password or certificate when needed" from the "Options: Preferences" menu item. Note: this applies to servers which demand a username and password in a popup. It is non-trivial to use XLS on a site where logins are required, but see FAQ point 26 for ideas.