Does a finding matter just because it’s “statistically” significant? Must it remain unconvincing just because it isn’t? Is bigger always better?
By Kerry Edelstein
December 17, 2017
In June 2017, NPR published an article about scientific phrases also used in colloquial language, that have different – and very specific – meaning when used in the context of science.
One word in this article stuck out to me, as it relates in particular to scientists of the data science variety: significant.
Back in the early 90s, I had the accidental foresight to pursue a degree in biometry & statistics at a university that was, during the time I was there, ranked fourth in the world for its interdisciplinary statistics program. I wasn’t doing it to be cool; this was before Google, Facebook, neuro-marketing tools, or the phrase “big data.” I was just a teenager who was good at math, who figured applying it to real life would be interesting.
The amiable department head of our statistics program made a comment one day in class that caught my attention and made me chuckle: Not everything statistically significant is practically significant. He encouraged us to think outside the confines of mathematical import and understand when and why a finding conceptually matters. To this day, it’s both the most memorable thing I learned in a college classroom, and arguably the most useful in my professional life. Thanks, Chuck. I can call you that now, right? (Side note: Chuck also gave me great advice as I was graduating: “Take the job where you’ll learn the most.” It was wonderfully valuable and practically significant advice – advice that I passed on to my nephew when he graduated college.)
As big data has collided with market research, I’ve been surprised to find that I regularly encounter big data analysts who forget the distinction between practical and statistical significance. And there are three types of myths I typically witness:
Let’s unpack all three of those myths.
Myth #1: A statistically significant finding necessarily matters.
Would that this were true – it would make data interpretation much easier. Unfortunately, it’s not. Ask 10,000 people a question, and a .2% difference will be statistically significant. Sometimes that small percentage is tremendously important – it can mean the difference between winning and losing an election, for example.
But sometimes it’s not. For example, if I’m trying to select the right creative for an ad campaign, and 50.15% of people prefer option A, while 49.95% choose option B, I’m not categorically declaring A the winner with the insight of “Move forward with Creative A.” More likely, I’m looking at who chose A vs. B, and building an argument to serve different creative to difference audiences, because neither is a home run. Statistically, Creative A was the winner. Practically, that doesn’t much matter.
Myth #2: If it’s not statistically significant, it shouldn’t be believed.
Myth #1 has a corollary Myth: findings must be statistically significant in order to have practical significance.
Well, if that were true, there would be no focus group insights. But even when hard data is involved, sometimes practically significant insights surface from the swell of statistical insignificance.
During the first few years of Research Narrative, we worked with a company that does interactive TV advertising, to evaluate the effectiveness of their ad campaigns. During our first project, as we pulled together the interactive materials to test aided ad recall, one by one we each got sucked into an 8-minute video of the Imagineering of California Adventure’s Cars Land. I knew they were onto something when our survey programmer stopped to watch the video as he encoded it, pronouncing, “Wow, that’s really cool! Now I want to take my kids there!” Such enthusiasm was not afforded to every video he encoded.
As we started to research the performance of these emerging and superbly cool interactive TV ads, we found two things. 1. People who had access to the interactive platform seemed to love it. And 2. Not many people had access. Because interactive TV was experimental and nascent at the time, adoption was still low. Sample sizes were small, and double digit lifts in awareness and purchase intent were often directional; rare was the occasion when an improvement was statistically significant. Still, the direction of the findings was always the same. It was consistently upward, sometimes dramatically so. The platform clearly worked. If distribution broadened, it was going to be a home run.
We didn’t need a 95% confidence interval to know that was practically significant.
Myth #3: Bigger sample sizes are always better.
If I were the betting type, I could have retired in my 30s simply by placing bets on the number of digital media “data scientists” who argued that a 100,000 person sample of active social media users is better than a 1,000 person general population sample. Be gone, surveys.
The logic error here is assuming that a big sample is representative of the people you want (need) to hear from. This faulty logic has a name – convenience sampling – and is one of the first lessons you learn when you take a reputable statistics class. Fall for this trap, and you’ll have a statistically significant finding among a biased population of people who don’t represent your target market. It’s a great way to arrive at the wrong answer.
Anyone who does online campaign brand lift research has probably bumped into big samples with questionable projectability. Last year, we consulted with an online publisher that was seeing what we refer to as “negative brand lift” on their site’s ad campaigns. Their data suggested that people who saw the ads (the “exposed” group) were less familiar with advertised brands than people who didn’t see the ads (the “control” group). The research company behind this data touts its large samples as superior, but the reality is that their path to large samples involves not knowing who’s in them. And given the counter-intuitive findings, it’s safe to say that with this client, the control and exposed groups were routinely not comparable.
Every online publisher we’ve worked with uses the above firm to do brand lift research. And yet, their strategy for driving large sample sizes not only compromises data accuracy, it actually prevents data accuracy. The myth of “bigger is better” is perhaps the biggest area of wasteful spending we observe in our consulting. Bigger is only better when it’s representative of the greater whole.
We haven’t even touched upon methodologies that offer convincing insights without the benefit of data at all. The series of focus groups that offer universal agreement. The in-home ethnographies that provide an “ah-hah!” moment that pieces a puzzle together. Open end comments that all offer the same sentiment.
To us, what significance really boils down to is this: Do you believe it, and does it matter? Sometimes it does take a big sample size and p-value <.01 to answer yes to both questions. But just as often, it does not.
How’s that for a practically significant insight?
More from The Thinkerry…
Like this article? Share it on social media: