- Home
- Learn Linux
- Learn Electronics
- Raspberry Pi
- Programming
- Projects
- LPI certification
- News & Reviews
9 September 2011
When I'm writing my blog I often add links to relevant websites. A quick Google and click on the link and I can easily copy the site into my blog. If however the links are direct to PDF files then it's usually just a case of right clicking and choosing "Copy Link Location" (I'm a Firefox user, if you are using Internet Explorer then it's called Copy Shortcut). Normally this works OK as well, except that Google likes to track the number of people following it's links. This isn't always the case, but sometimes you end up with a url that goes to Google rather than directly to the website. Striping out the Google part manually is difficult as the embedded url is encoded as a web safe string.
An example may help explain this:
I recently linked to a PDF document on the Worcestershire County Council website (on this blog post regarding road safety near Redditch schools).
The url that the Google search result provided is:
http://www.google.co.uk/url?sa=t&source=web&cd=3&ved=0CHQQFjAC&url=http%3A%2F%2Fwww.worcestershire.gov.uk%2Fcms%2Fidoc.ashx%3Fdocid%3D114da69e-2bf5-4b9e-949d-52c07d878560%26version%3D-1&rct=j&q=worcestershire%20EXPERIMENTAL%20ADVISORY%2020MPH%20SCHOOL%20SAFETY%20ZONE%20SIGNS&ei=QfNlTqOWDIPq0gHGuY2WCg&usg=AFQjCNH5TXwGSEJ7pmqmQ_7iZxQHMYXncg&sig2=Mi0TBj-74JvCy-WteiJggw&cad=rja
which is 376 characters long! [Try fitting that into a 140 character tweet without shortening].
As you can see the first 67 characters are the bits added by Google and do not relate to the website I wanted to link to. Also the encoding adds additional characters so the actual url is 291 characters.
This is a pretty extreme example, but some sites do have some long urls, particularly for attached documents within CMS systems.
Google is not the only site that does this, for example Facebook does a similar thing with some links from it's pages.
I've therefore created some Javascript code that can strip out the Google part of the url.
First it looks for the string &url= which denotes the start of the real url. It then unescapes the characters, which is a built-in Javascript functions). However as some characters need to remain encoded (particularly spaces) it then converts them back.
The alternative would be to only unescape the characters that need to be (eg. %3A = :, %2F = /, %3F = ?, %3D = =), but I found it easier using the built-in function rather than using a lot of regular expression manipulation.
Try the code below:
Enter url above and click convert
Or download the urlfunctions.js - javascript file.
Note that this is very basic. It will only work when the url is embedded with &url= which is the last part of the url. It doesn't include much error checking. There is no security risk because this only runs as Javascript in the users browser, but could be used as the basis of an improved version that would handle different formatting etc.