Data inflation can destroy the integrity of your data—and perhaps your reputation. Consider this scenario.
Six months ago, your company rolled out a completely new version of your web app. Product management claims the rollout has been a great success, customer feedback has been positive, and they’re now pushing to sunset the old version.
As the analyst, you’ve been asked for a few reports including total visitor sessions for the new version vs. the old version over the last month. You dutifully log in to your web analytics tool and pull some reports, and start building the comparison.
You are shocked by the numbers. It seems that sessions for the old version aren’t as low as they ought to be in comparison to the new version. Total visits are lower, but only slightly. And a few other key metrics are higher! You look at a few other reports, and unfortunately, the numbers don’t support Product management’s recommendation.
You present your findings and recommend against sunsetting the old version for now because the metrics show that user adoption needs work. Product management glares at you from across the table.
What’s Going On?
In one data quality audit that included the 200 top Internet Retail websites, ObservePoint found that across the board, data quality problems are rampant. Missing data, data inflation, and data leakage are experienced by even the most sophisticated digital marketers.
One lesser-known data quality problem, data inflation, is exactly what it sounds like: digital data that is reported inaccurately high. Inflation can affect web analytics data such as traffic counts, advertising data such as clicks and impressions, sales attribution data, or a variety of important metrics. And as the data gets passed up into other systems, the effects of the inflation are magnified.
What causes such a sinister breach in data quality? A relatively simple but common mistake: duplicate analytics tags on a single page.
Duplicate Tags Cause Data Inflation
When more than one instance of a tag fires on a page, data inflation happens. Each time a tag fires, something gets counted, and when a tag fires more than once, the thing is being counted more than once as well.
Common Causes of Duplicate Tags
Digital marketers have become much more disciplined about tag placement. In the past, tags might be placed anywhere on the page—in the header, in the body, in the footer, in both, all three, just anywhere, or maybe we were lucky if tags made it on the page at all.
But sometimes tags might be coded to multiple places on a page, transposed from other pages, or duplicated in some other way. Regardless of the process that causes it, once the tag is duplicated, you’ve got data inflation.
More common today–especially among the advanced digital teams–is the deployment of a Tag Management System. TMSs have helped reduce the problem significantly, but not altogether. No TMS can prevent the old-fashioned deployment of tags directly to pages (either on purpose or on accident), or control tags that aren’t inside the TMS.
Our website audits have uncovered websites where a TMS was deployed yet duplicate tags existed for some digital measurement tools. This is a problem, since there might be a temptation to mentally ignore the tags if they’re not represented inside the TMS.
Because stakeholders don’t always follow best practices when deploying tags, tag deployments erode, decay, and fall out of compliance with the initial “clean” deployment over time.
Maintaining accurate tagging is ultimately your responsibility as a digital marketer or analyst.
You Must Be Vigilant Against Data Inflation
At ObservePoint, we have a mantra: The only thing worse than no data is bad data. This is a shared conviction between our founders and every ObservePoint employee.
I should note, the scenario at the beginning of this article is a true story. After the fact, we were asked to audit the sites in question. We found that the old product pages had 300-500% data inflation caused by duplicate tags, and the company came to the belief that the company’s new product release was probably a success.
Unfortunately, this knowledge wasn’t available at the moment the data was being used. Executives made a revenue-impacting decision, based on bad data.