Tags

Aaron Shaw had an interesting post at the Dolores Labs blog last week that examined how using different question scales in surveys can elicit very different responses:

You can ask “the crowd” all kinds of questions, but if you don’t stop to think about the best way to ask your question, you’re likely to get unexpected and unreliable results. You might call it the GIGO theory of research design.

To demonstrate the point, I decided to recreate some classic survey design experiments and distribute them to the workers in Crowdflower’s labor pools. For the experiments, every worker saw only one version of the questions and the tasks were posted using exactly the same title, description, and pricing. One hundred workers did each version of each question and I threw out the data from a handful of workers who failed a simple attention test question. The results are actual answers from actual people.

Shaw asked the same question to both samples but altered the scale of the available answers:

Low Scale Version:
About how many hours do you spend online per day?
(a) 0 – 1 hour
(b) 1 – 2 hours
(c) 2 – 3 hours
(d) More than 3 hours

High Scale Version:
About how many hours do you spend online per day?
(a) 0 – 3 hours
(b) 3 – 6 hours
(c) 6 – 9 hours
(d) More than 9 hours

He found that there was a (statistically) significant difference in the responses he received from questions using both the high and low scales. More specifically, more people responded that they spent more than 3 hours online per day when presented with the high scale question. Additionally, more people exposed to the high scale responded that they spend less than 3 hours online per day. What accounts for this? Shaw hypothesizes that it is the result of satisficing:

[...] it happens when people taking a survey use cognitive shortcuts to answer questions. In the case of questions about personal behaviors that we’re not used to quantifying (like the time we spend online), we tend to shape our responses based on what we perceive as “normal.” If you don’t know what normal is in advance, you define it based on the midpoint of the answer range. Since respondents didn’t really differentiate between the answer options, they were more likely to have their responses shaped by the scale itself.

These results illustrate a sticky problem: it’s possible that a survey question that is distributed, understood, and analyzed perfectly could give you completely inaccurate results if the scale is poorly designed.

It’s an important point–how you ask a question can have a significant impact on the answers you get. Or put another way, you need to pay as much attention to design and structure of your questions (and answers) as to the content of those questions.

A number of commentators chimed in about when it is better to use scale versus open-ended questions. One major advantage that comes immediately to mind is that scale questions don’t require analysts to spend additional time coding answers before commencing with their analysis. While open-ended questions may avoid the issue of satisficing (which I am not convinced they do–respondents could easily reference their own subjective scale or notions), they do place an additional burden on the analyst.  For short, small-n surveys this isn’t that big of an issue.  However, once you start scaling up in terms of n and the number of questions it can become problematic.  Once you get into coding there are all sorts of issues that can arise (issues of subjectivity and bias, data entry errors, etc). Some crowdsourcing applications like Crowdflower may provide a convenient and reliable platform for coding (as I’ve mentioned before), but at some level researchers will always have to make an intelligent trade-off between scale and open-ended questions.

About these ads