URL shuffling and Google juice

8 Oct 2012

Huntsman spider.320x320

I thought I’d write up something I did at work today, in case it’s useful to other “webmasters”.

We have a site which, through evolution, history and multiple editors, has ended up with a cluttered URL space. It happens.

This is a worthwhile thing to do for tidiness, to enable future redesigns, and to ensure good navigation (and hence Googlability). I was keen to do it in a way that maintained existing links, and in a way which would preserve search engines’ knowledge of our pages. In other words, I needed to leave 301 redirects in place.

First, how to get a good list of the URLs across the site. CMS systems can usually produce this, but I regularly run Xenu’s Link Sleuth - a free, recommended program for checking URLs on your website (or on a mirror of your website). It’s easy to produce a list of all the accessible pages, and you can run it again when you’re finished to make sure you haven’t left any broken links.

I could’ve put the URLs in a big list and drawn another column next to it, and got busy with “fill down” in Excel. Instead though, I wrote a script to enable me to do the reorganisation by drag-and-drop, creating subfolders and renaming things as I wished, and another to turn the results into a list of operations necessary to achieve the reorganisation.

Thus far I’ve only enacted the change on an internal mirror of the website, because I’m going to test thoroughly before doing it on the live site.

Here is the code in case it’s useful to anybody else.

URL shuffle on github

The accompanying image is a snap of the biggest arachnid I have ever had to catch in a box. Not an experience to be repeated!

Tags: owlstone, webmaster, google, seo, code, github