How to prevent Google Analytics sampling data

We all know the horrible sign in Google Analytics that looks like this:

You want to analyze a bunch of data but because of that sign you know it will be sampled (incomplete) data. So, what is "sampling" exactly and how can you prevent it.

What is sampling

After Google Analytics gathered the raw unaggregated data that is being tracked by the tracking script it processes it to understandable and useful visit data. And with that visit data all available standard reports are pre-calculated and stored. That means for example that if you try to get the "Top Content" report, Google Analytics can show it to you in seconds because most of the calculations are already done.

But if you want to get some data that is not pre-calculated, and Google Analytics has to search for it in the stored visit data, you could hit the sampling trigger. As you can imagine, this is a very heavy process that costs a lot of resources. Google has decided that if you are searching for data in more than 500.000 visits or 1.000.000 lines of data it will sample the data to save time. That means that only 500.000 visits or 1.000.000 dimensions are used to create the report you're asking for, and Google Analytics will provide you with a certain range:

That range indicates the lower and upper boundary of where the truth lies with a statistical significance that has a significance level of 5%. That means it's for 95% sure the presented range is correct.

How to prevent sampling

If you really want to use unsampled data in your reports while segmenting: make sure you select a date range that has less than 500.000 visits in it. If you want to analyze a larger date range: export the numbers of the two (ore more) date ranges to Excel and combine them there.

Another solution is to create multiple profiles that track a smaller part of your site. Within those profiles you won't hit the 500.000 limit as soon as the main profile.

The other sampling threshold regarding the 1.000.000 max dimensions is not reached very often. In most cases the top content report is the first one to hit that limit. Google will only retrieve 1.000.000 URL's for a specific period. That means, 1.000.000 divided by the days in your selected date range. For example: 2 months will give you 1.000.000/60 = 16667 unique URL's. So it could be that the URL you where looking for is not in the report. The solution is to select a date range where the amount of unique URL's in less than 1.000.000.

Final word

I know, the solutions are not the ones you really want, and with huge accounts they are useless. But in many cases exporting per proper date range selection is a good solution.

Click to activate social bookmarks


12 thoughts on “How to prevent Google Analytics sampling data

  1. You might also want to consider using a 3rd party application and connect it via the Google Analytics API. There are a lot of tools that can handle the heavy lifting.

      1. Hi Andre,

        I just ran a report using then Data Feed Query Explorer that had two rows of data in it and got the dreaded "This result is based on sampled data" message at the bottom next to the "Get Data" button. It looks like the API uses exactly the same sampling method as the GA UI.

  2. All,

    I agree with Brendan. The API works the same as the UI. My results show less than 250,000 visits, but each time I run the same query - I get a different result.


    1. It's not in the results, if the data you're querying contains more than 500.000 visits you will hit the sampling treshold. How many visits do you have in the period you want data from?

    1. Interessting tool, my guess is they download the data through the API in small parts (per day or per hour) to get unsampled data. I will definitely have a look at it.

Comments are closed.