Recently on one of the sites I manage, I noticed a lot of pages showing up in Google Analytics that were not a part of the site. These all began with the word “translate” and were of course regular pages viewed on the site but using Google’s Translation service.
The problem occurs from the way translation services work in that essentially Google’s servers are opening the page on your site, translating it and then displaying it to the visitor through a page located on Google’s server. When the Google Analytics Javascript starts up, it is quite right in thinking that you are trying to track a page on Google’s system with a complicated URI beginning with translate.
This was polluting my data and generally making things look untidy so I fired up my regex tester and came up with a couple of filter for cleaning up the data. It’s relatively simple to do as the full URI of the page being visited is contained within the query string of the translation page and therefore in the Request URI variable that is submitted to Google Analytics.
Hostname
Create a new Advanced Filter and call it “Google Translations (Hostname)”. Select Request URI from the Field A drop-down and enter the following pattern into the input box:
/translate.*u=http://([^/]*)
The URI will begin with “/translate” so we match this first. We then use a “.*” to disregard any text between this and the querystring variable “u” that contains the URI of the page being translated. The URL will begin with “http://” so we match this and then create a bracketed match to capture the hostname. We use the “^/” syntax so that it stops capturing the hostname when it encounters a “/”, as these can’t exist in hostnames.
In the Constructor, select Hostname from the drop-down box and enter $A1 in the input box. This will put our captured hostname where it belongs.

Request URI
We still need to capture the URI of our page, so we create another Advanced Filter called “Google Translation (Request URI)”. We then select Request URI from the Field A drop-down box and enter a similar pattern to before:
/translate.*u=http://([^/]*)/?([^&]*)
We start by matching and capturing the same as we did before, but then we extend it a little. We need to capture everything after the hostname (i.e. everything after the first “/”). But we only the URI of the page, not the rest of the querystring, so we use the “^&” pattern to stop when we hit an ampersand (i.e. the next query variable).
What makes it slightly more difficult is that there’s no garantee that they will be a “/” character to capture. If the visitor translates your homepage by entering http://www.yourdomain.com then there’s no backslash to capture. This is why we put the backslash outside the brackets and modify it with a ? mark, signifying that it may or may not be there.
In the Constructor, we select Request URI and enter “/$A2″. This way, we are always starting the Request URI with a backslash and then adding whatever else we may have captured from the query varibale. This means that page views on your homepage will be logged as a single backslash as they normally would be.

Problem Solved
So we’ve successfully rewritten both the Hostname and Request URI, eliminating any of those annoying translation URLs. It’s worth noting the order in which the filters are done. We need to do the hostname first as otherwise we would overwrite the Request URI and not be able to match the hostname from the old URI. Filter ordering problems like this are quite common, so if you’ve got other filters set up on your profile, take some time to establish what modifies what and create an order so that they don’t interfere with each other.
I’m currently looking into other translation services around the Web (Babelfish comes to mind) and cooking up some filters to work with these but, from my traffic at least, it seems that Google pretty much has the translation market sewn up.
Hi Andy
Thanks for your explanation on filters in Google Analytics.
I do however have one question.
I installed the filters as you described, but I still get the /translate urls through. (like: /translate_c?hl=desl=nlu=http://www.middelburg.nl/prev=/search?q=middelburghl=declient=firefoxhs=7pGrls=com.yahoo:de:officialrurl=translate.google.deusg=ALkJrhgm4XS6-ktHnbJXmj0Y-4SKbru6qQ )
Besides that I noticed that you refer to backslash whereas I can only see slash. Do you mean slash ?
Like in: / Or backslash like in: \ ?
Thanks for putting this out on the Web
Edwin