Of all the quotes in the infinitely quotable movie “The Princess Bride,” (“Have fun storming the castle!” “As you wish,” “Never get involved in a land war in Asia!”), one always stood out for me:
Vizzini: HE DIDN’T FALL? INCONCEIVABLE.
Inigo Montoya: You keep using that word. I do not think it means what you think it means.
If I’ve encountered one problem more than any other in marketing data, I’d call it the Inigo Montoya problem: the dangers of using misleading data.
Photo courtesy of, let’s see here, The Daily News. Huh.
The problem seems so widespread–and so dangerous–that I’ll address it over two columns:
- How misleading data occur
- How to prevent or work around bad data
In theory, bad data shouldn’t exist. Of course, as Homer Simpson noted, “In theory, Communism works. In theory.” How do bad data arise?
It pays to remember the aphorism that computers don’t do what you want them to do; they do what you programmed them to do. By extension, the data those computers pump out report what they’ve been programmed to report, not what the variables have been named. Some of these misunderstandings include:
- Terms and conditions. No, I’m not talking about those agreements you don’t bother to read when you download a new OS. I mean the “terms and conditions” of a data point. Back in the day, Virginia used to rise to the top of all states in visitor demographics on web analytics suites. Why? Because years ago so much consumer traffic went through AOL that it tipped the scale. Fast forward to today, and most of us know that email marketing platforms miscount opens because they count the firing of an invisible 1×1 pixel that does not occur when the recipient has images turned off and does occur when the email client opens the email in a preview pane without the recipient opening it.
- Poor data gathering assumptions. In this case, I use data to include preference center choices as well as survey and other market research data. Did you ever fill out a preference center form and find a question for which none of the multiple-choice answers described you? Or maybe the question asked you to pick one but you thought two (or three or more) described you? That’s what I mean. Even with the advent of tools like SurveyMonkey, fielding market research represents a time-consuming and usually expensive exercise for marketers. Similarly, preference centers remain deceptively hard to build. However, even experienced marketers don’t always get the questions right the first time.
- Dumb shit. And sometimes, the people handling the data make inexplicable errors. A few years ago, I analyzed the member database of a global hotel loyalty program. After running some tabs, I found that the second-most popular language in Saudi Arabia was Chinese (#1 was English). I scratched my head and wondered whether Saudi Arabia had a large Chinese expatriate community. Finding nothing to corroborate this theory, I looked at the member-level data and saw that these alleged Chinese speakers had names like “Daoud,” “Mohammed” and “Kareem.” I never found out what really happened, but I suspect that the system only recognized one non-Roman character set: Chinese. As a result, when faced with forms in Arabic, it assumed that they were in Chinese. Point is, some data errors crop up because of bonehead plays like this one.
Up next: how to protect yourself against bad data. In the mean time, please share any bad data horror stories in the comments.