False Distributions and Reason Magazine

John Vandivier

Survey question wording can result in a misleading response distribution.

<a href="http://reason.com/blog/2015/05/16/if-jeff-sessions-anti-immigration-and-fr">A recent Reason article stated the following:

Republican animus against iimmigration [sic] is wildly out of line with the rest of the country. Fully 84 percent of Republicans say they are \"dissatisfied\" with current levels of immigration (presumably, they want them decreased). Yet just 39 percent of all Americans say they are dissatisfied and want to see a decrease.
I replied, and the brilliant Krzysztof Ostaszewski seconded, as follows:

<a href="http://i.imgur.com/Qm72vrv.png"><img class="aligncenter" src="http://i.imgur.com/Qm72vrv.png" alt="" width="526" height="244" />

So we are in agreement that the conclusion is bad due to a poor analytical assumption. Now, what if the survey question had allowed for more distributed responses, such as:

  1. Less immigration
  2. Keep immigration constant
  3. More immigration
Categories 1 and 3 would have been represented under the less specific \"dissatisfied with current levels.\" Not only is there a different binary pattern between satisfied and dissatisfied revealed, we now have a trend pattern with respect to a principle. The principle in this case is the degree of immigration. Trend patterns are far more meaningful.

Getting away from this particular case a bit, it's possible in theory to collapse a normal distribution into a distribution which appears to be logarithmic or exponential by asking a survey question in a less than ideal way. Consider the question, "On a scale from 1 to 10, how much change would you like to see in the level of immigration, where 1 is very little and 10 is a lot?"

You might see something like an exponential curve moving up and to the right, where the y axis is the count of the individuals who chose some particular number. However, if you asked the question again with very little being 10 and a lot being 1, you would see a curve which appears as an L-shaped or logarithmic curve.

Now, consider if you changed the question to "On a scale from 1 to 10, how much change would you like to see in the level of immigration, where 1 is a large reduction in the level, 5 is no change, and 10 is a large increase in the level?"

Ceteris paribus, a transformation in the data would be expected such that the curve would now look like a U-shape inverse parabolic distribution, or inverse normal distribution. If the subject matter were an issue where people prefer the status quo to a change we would see something like a normal distribution and if the subject matter were an issue where people are perfectly indifferent between a change and the status quo we would see a flat distribution.

The last case is perhaps most interesting because this implies that we can see large differences on a binary choice scale where on an accurate distribution there is no demonstrable social preference for the thing which is preferred on the binary choice scale. For example, a hypothetical:

Should we change the level of regulation? (Binary)

Yes: 67%

No: 33%

Media: "Americans demand regulatory changes!" (Or worse yet "Americans demand more regulation," etc.)

Should we change the level of regulation? (Ternary, the minimum to detect a real trend)

More: 33%

No change: 33%

Less: 33%

Media: "Americans indifferent to regulatory changes!" Of course they wouldn't run that story, although it is the accurate interpretation. They would probably make it something more like "Americans heavily split on the extremely controversial new debate around regulatory change!"