We all know the horrible sign in Google Analytics that looks like this:
You want to analyze a bunch of data but because of that sign you know it will be sampled (incomplete) data. So, what is "sampling" exactly and how can you prevent it.
What is sampling
After Google Analytics gathered the raw unaggregated data that is being tracked by the tracking script it processes it to understandable and useful visit data. And with that visit data all available standard reports are pre-calculated and stored. That means for example that if you try to get the "Top Content" report, Google Analytics can show it to you in seconds because most of the calculations are already done.
But if you want to get some data that is not pre-calculated, and Google Analytics has to search for it in the stored visit data, you could hit the sampling trigger. As you can imagine, this is a very heavy process that costs a lot of resources. Google has decided that if you are searching for data in more than 500.000 visits or 1.000.000 lines of data it will sample the data to save time. That means that only 500.000 visits or 1.000.000 dimensions are used to create the report you're asking for, and Google Analytics will provide you with a certain range:
That range indicates the lower and upper boundary of where the truth lies with a statistical significance that has a significance level of 5%. That means it's for 95% sure the presented range is correct.
How to prevent sampling
If you really want to use unsampled data in your reports while segmenting: make sure you select a date range that has less than 500.000 visits in it. If you want to analyze a larger date range: export the numbers of the two (ore more) date ranges to Excel and combine them there.
Another solution is to create multiple profiles that track a smaller part of your site. Within those profiles you won't hit the 500.000 limit as soon as the main profile.
The other sampling threshold regarding the 1.000.000 max dimensions is not reached very often. In most cases the top content report is the first one to hit that limit. Google will only retrieve 1.000.000 URL's for a specific period. That means, 1.000.000 divided by the days in your selected date range. For example: 2 months will give you 1.000.000/60 = 16667 unique URL's. So it could be that the URL you where looking for is not in the report. The solution is to select a date range where the amount of unique URL's in less than 1.000.000.
I know, the solutions are not the ones you really want, and with huge accounts they are useless. But in many cases exporting per proper date range selection is a good solution.