Data quality — the obstacle to data-driven marketing
Bad data. It seems this is the modern villain preventing us from having a truly data-driven business environment. It just keeps popping up in conversations, reports, planning, and forecasting. It seems easy and straightforward to blame bad data for keeping us from reaching our defined goal or conclusions.
According to a study in the Harvard Business Review, half (50%) of people working with data are wasting their time hunting for data, finding and correcting errors, and searching for confirmatory sources for the data they don’t trust!
Harping on bad data as the reason or the major blockage toward achieving our goals is only scratching the surface of a symptom that may be coming from many causes. Let us dig in.
What is bad data?
First, what do we consider bad data, often described as “dirty” or “rogue” data, to be? In layman’s terms, it’s data that contain errors such as spelling or punctuation faults, incomplete data, outdated data, duplicate instances in the database, and incorrect data associations. Bad data is data that our teams don’t trust, or worse, it’s data that we trust but shouldn’t.
So what causes bad data? A lot of things. Bad data is an outcome or a manifestation of a series of events that cause them.
Here’s a look at them one by one, their root cause, and potential steps for a quick resolution.
Incomplete data
Cause: This can come in a couple of forms: completely missing or partially filled data. Incompleteness limits not only the insights we can derive from the data (such as reporting and analytics) but also limits any data-driven operations (like AI/ML).
Cure: Implement data creation “gatekeepers” that stop the creation of incomplete data. In form-fills, for example, aid with a typeahead or auto-complete functionality leaning on a robust set of external referential data to fill in the form. Practice governance on ensuring mandatory fields are populated intelligently through data quality checks.
Duplicate data
Cause: This is when records unintentionally share attributes with other records in the database. When duplicate data is present in your data ecosystem, the consequences may include overcounting when aggregating data producing incorrect values for reports and analyses, wasted outreach efforts, and confusion. Managing business becomes increasingly challenging as the effect of duplicate data is compounded.
Cure: Understanding what “dupes” you’ll keep, let go or archive requires governance informed by the needs of the business. Master your data through clustering (i.e., match/merge) techniques. Bring like versions of these records as members of that cluster. Pick the best version as the main entity and the rest as members of that group. This is a systematic way to dedupe your data. Since not all duplicates are created equal, you might want to keep some (due to business or regulatory needs) and keep them within a manageable cluster. This is the concept of a golden record.
The machine-learning model in D&B Connect Manage, Dun & Bradstreet’s latest data management offering, can drive cluster-centric resolution of nearly 100% of duplicates to create trusted golden records across data sets. What has been (and still is) a major undertaking for most businesses is now within reach.
Disparate source systems (data silos)
Cause: It is almost unavoidable to have many different source systems. In fact, a 2021 Dun & Bradstreet study found that the average sales and marketing tech stack had at least 10 tools in play. The complex business arena we have now practically forces them. Being able to manage them as part of the system could be a daunting task. Although they may not share the same processes, the data could need to relate to other data sets. The concepts of data warehousing, data lakes, and now data meshes were conceived to make the management of data coming from different systems possible and scalable.
Cure: The knee-jerk reaction is establishing a data lake, but it’s not enough to bring all your data into one place. Without curating, qualifying, and governing the data coming into the lake, you could easily turn it into a data swamp. On top of technically securing the flow of data through connections, like APIs, you should think about mastering your data in your data lake using clustering methodologies to scale relating data from disparate sources into a common environment. Being able to create a golden record through clustering similar entities will lead you to a firmer understanding of data overlap and net new. Having a match/merge engine will help you manage existing and new data sources in your data lake.
Data decay
Cause: Of all business-to-business master data, contact data arguably seems to decay the fastest. In some areas, you will find that your data may decay at the rate of 34% annually. This can be quite alarming for data-driven organizations, as they derive insights for decision-making from data. The statistic mentioned can be quite daunting as we increasingly depend on data to run our businesses. The current economic situation makes it far more dire to pay attention to data decay. Companies going out of business, supply chain issues, and The Great Resignation are examples that add complexity to the expected mergers, acquisitions, and divestitures the market experiences. How can you ensure your data remains relevant?
Cure: Data enrichment. You need to be able to periodically infuse your data with a reliable external reference data source. As the saying goes, don’t throw out the baby with the bathwater. It is so easy to brand your current data assets as substandard with their poor performance or from hearing anecdotes from those who depend on the assets. Work with outside or third-party sources to provide up-to-date attributes on your existing contact data. As discussed above, we are facing data decay at the rate of 34% or more per year. You need to have an effective enrichment schedule paired with your organization’s threshold of data accuracy. Performing them ad hoc may be a disservice to your users, as it will not scale. Provide an enrichment strategy and a schedule and communicate with your stakeholders.
Conclusion: a case for data governance
These recommendations and best practices are just pieces to a larger puzzle. There is a strong need for data governance to institute policies and adhere to data quality standards in order to stop the hemorrhage of substandard data into your data asset. The good news is that many of the proposed cures are attainable – and can be automated at scale – with AI and ML.
The recommendations above, in addition to understanding where, when, and how to implement these steps, are crucial to your data strategy. The solution and the root cause are the same: data governance. It is a function we cannot go without anymore. Our increasing dependence on data proves this.