With the holidays in full swing, everyone is busy making their lists and checking them twice. But while electronics and toys routinely top the wish lists for most, the data-inspired leaders of the world have some unique desires that can’t easily be purchased from your favorite store.
Whether you’ve been naughty (online hookup site for married couples was breached by hacking outfit, The Impact Team, and the personal details of 37M users were made public, leaving many men sleeping on the couch) or nice (Data Science for Social Good, a program at the University of Chicago that connects data scientists with governments, is working to predict when officers are at risk of misconduct, with the goal of preventing incidents before they happen), chief data officers, data scientists and all data stewards want better and safer ways to do their jobs.
Instead of playing Santa and asking them to sit on my lap and tell me what they want for the holidays, I figured I’d simply share some of the top things we’ve heard on data leaders’ wish lists this year.
1. A Better Way to Find Truth in Data
Mark Twain famously said, “There are three kinds of lies: lies, damned lies, and statistics.” One of the biggest problems we’re faced with every day is trying to make sense of the data we have. In a perfect world the answer to all of our questions would lie smack dab in the data itself, but that’s not the case. The premise that data can get us closer to that single version of the truth is harder to achieve than first thought. But it hasn’t stopped us from trying to form conclusions from the data that is presented. Sometimes we rush to conclusions in the face of mounting pressure from others who demand answers.
What we really need is a source of truth to compare it to, otherwise it is very hard to know what the truth actually is. Unfortunately, that is often an impossible goal – finding truth in a world of ambiguity is not as simple as looking up a word in the dictionary. If you think about Malaysia Airlines Flight 370, which tragically disappeared in 2014, there were several conflicting reports claiming to show where the downed airline would be found. Those reports were based on various data sets which essentially led to multiple versions of proposed “truth.” Until they finally found pieces of the wreckage, searchers were looking in multiple disconnected spots because that was what the “data” said. But without anything to compare it to, there was no way to know what was true or not. This is just one example how data can be used to get an answer we wall want. This same thing happens in business everyday, so the takeaway here is that we need to stop rushing to form conclusions and try to first understand the character, quality and shortcomings of data and what can be done with it. Good data scientists are data skeptics and want better ways to measure the truthfulness of data. They want a “veracity-meter” if you will, a better method to help overcome the uncertainty and doubt often found in data.
2. A Method for Applying Structure to Unstructured Data
Unstructured data – information that is not organized in a pre-defined manner, is growing significantly, outpacing structured data. Experts generally agree that 80-85% of data is unstructured. As the amount of unstructured data continues to grow, so does complexity and cost of attempting to discover, curate and make sense out of this data. However, there are benefits when it is managed right.
This explosion of data is providing organizations with insights they were previously not privy to, nor that they can fully understand. When faced with looking at data signals from numerous sources, the first inclination is to break out the parts that are understood. This is often referred to as entity extraction. Understanding those entities is a first step to drawing meaning, but the unstructured data can sometimes inform new insights that were not previously seen through the structured data, so additional skills are needed.
For example, social media yields untapped opportunities to derive new insights. Social media channels that offer user ratings and narrative offer a treasure trove of intelligence, if you can figure out how to make sense of it all. At Dun & Bradstreet, we are building capabilities that give us some insight into the hidden meaning in unstructured text. Customer reviews provide new details on the satisfactory of a business that may not previously be seen in structured data. By understanding how to correlate negative and positive comments as well as ratings, we hope to inform future decisions about total risk and total opportunity.
With unstructured data steadily becoming part of the equation, data leaders need to find a better way to organize the unorganized without relying on the traditional methods we have used in the past, because they won’t work on all of the data. A better process or system that could manage much or all of our unstructured data is certainly at the top of the data wish list.
3. A Global Way to Share Insights
Many countries around the world are considering legislation to ensure certain types of data stay within their borders. They do this out of security concerns, which are certainly understandable. They’re worried about cyber-terrorism and spying and simply want to maintain their sovereignty. Not surprisingly, it’s getting harder and harder to know what you may permissibly do in the global arena. We must be careful not to create “silos” of information that undermine the advancement of our ability to use information while carefully controlling the behaviors that are undesirable.
There’s a method in the scientific community that when you make a discovery, you publish your results in a peer-reviewed journal for the world to see. It’s a way to share knowledge to benefit the greater good. Of course not all knowledge is shared that way. Some of it is proprietary. Data falls into that area of knowledge that is commonly not shared. But data can be very valuable to others and should be shared appropriately.
That concept of publishing data is still confusing and often debated. Open data is one example, but there are many more nuanced approaches. Sharing data globally requires a tremendous amount of advise-and-consent to do this in a permissible way. The countries of the world have to mature in allowing the permissible use of data across borders in ways that do not undermine our concerns around malfeasance, but also don’t undermine the human race’s ability to move forward in using this tremendous asset that it’s creating.
4. Breeding a Generation of Analytical Thinkers
If we are going to create a better world through the power of data, we have to ensure our successors can pick up where we leave off and do things we never thought possible. As data continues to grow at an incredible rate, we’ll be faced with complex problems we can’t even conceive right now, and we’ll need the best and brightest to tackle these new challenges. For that to happen, we must first teach the next generation of data leaders how to be analytically savvy with data, especially new types of data that have never been seen before. Research firm McKinsey has predicted that by 2018, the U.S. alone may face a 50% to 60% gap between supply and demand of deep analytic talent.
Today we teach our future leaders the basics of understanding statistics. For example, we teach them regression, which is based on longitudinal data sets. Those are certainly valuable skills, but it’s not teaching them how to be analytically savvy with new types of data. Being able to look at data and tell a story takes years of training; training that is just not happening at the scale we need.
High on the wish list for all data stewards – and really organizations across the globe, whether they realize it or not – is for our educational institutions to teach students to be analytical thinkers, which means becoming proficient with methods of discovering, comparing, contrasting, evaluating and synthesizing information. This type of thinking helps budding data users see information in many different dimensions, from multiple angles. These skills are instrumental in breeding the next generation of data stewards.
Does this reflect your own data wish list? I hope many of these will come true for us in 2016 and beyond. Until then, wishing you the very best for the holiday season…