Finding Web Site Pages Missing Analytics

23 Feb, 2012  |  Written by  |  under Analytics

A frequent problem I see in evaluating web site traffic is that some site pages are missing Google Analytics code, preventing them from being tracked. Very often, these are form submission “thank you” pages (which are ideal Goal Pages in Analytics!) or other pages that have “funny” templates relative to the rest of a site.

The procedure I use to track these down works like this:

  1. Spider the site to capture all files, ignoring images:
    wget --recursive --random-wait -w 1 --force-html -R gif -R png -R jpg -R pdf -R css -R js -R mov
  2. Look for pages that DON’T have Analytics, again ignoring image files, css, etc.:
    find . -type f -print | xargs grep -c UA-12341234 | grep -v png | grep -v gif | grep -v jpg | grep -v images | grep -v pdf | grep -v css | grep -v robots.txt | grep -v '\.js' | grep :0

Works like a champ to ferret out weird pages. You can of course use the same approach to grep for other strings that need to be on all site pages.


