Dr. Pete on Nat Rep
12-4-2011
Dr. Pete
Dear Dr. Pete:
I often am asked to do a "Nat Rep" sample, but each researcher asks me to do it slightly differently. What exactly do they mean by Nat Rep and why isn't it always the same?
-Nathan
Dear Nat:
When researchers ask for a "Nat Rep" sample, they actually mean that the population of interest is the entire population of the country in question and that the sample should reflect this in its structure.
At its best, the "Nat Rep" sample will "look like" the population irrespective of how it is viewed. The numbers of men vs. women will match the national proportions, the percentage in each age group or each region will exactly match the population, etc. On non-demographic measures, such as product ownership or psychographics, you can also expect to find the sample matches the population.
Sounds simple enough, but where do we get the reference data in order to compare our sample to the population? From the census? Countries that have a census (and not all do) only conduct them about once every 10 years, so this data is quickly out of date. If we don't use census data, then we need to use market research data and the discussion becomes circular - how did that data get to be "Nat Rep?"
Most researchers use quota samples to get to a "Nat Rep." With a quota sample you set the absolute number of people you want to interview per quota cell and continue interviewing until all the cells are filled. Your sample will be 100% guaranteed to reflect the population based on the demographics you choose to target. However, the guarantee applies to these variables and these variables alone; everything else is subject to sampling error.
Take the example of age. If you set quotas on 16-34, 35-54, and 55+, then your sample will come back in the correct proportions. If you then choose to analyze this data by 16-20, 21-30, 31-40, 41-50, and 51+, there is no guarantee that the sample will still look correct. It is extremely unlikely that all the variables in your survey will in fact match the population - although they should be close.
The extent to which you can quota control a sample depends on the sample size and reference data. Imagine you have six age breaks, two genders, and 15 regions. As a fully interlocking grid, this is 180 cells. If your sample size is only 100, then it is not possible to fill all the cells. Even if your sample size is 250, the cell size calculation may mean you need to interview (in theory) only half a person of a certain age, gender, and region, so this cell will contain no one.
If you don't get a good "Nat Rep" sample it is always possible to weight the data. Weighting will (of course) happily deal with making a half a person where required. However, you do need something in each cell to weight (you can't make half a person out of nothing).
What variables should we use to get a "Nat Rep" sample? There is no single definitive answer; it depends on the research institute and the local research industry. In the UK, for example, you might be asked to quota on age, gender, region, and social class. In the US, it might be age, gender, region, and ethnicity. In Belgium, it might be age, gender, region, and language spoken. Age, gender, and region are fairly common and often allied with something that differentiates by economic status. This could be income, education, social class, or home ownership, for example.
Cheers!
-Dr. Pete