Episode 5: Improving Match Accuracy and Confidence
George L'Heureux:
Hello, everyone. This is Data Talks, presented by Dun & Bradstreet. I'm your host, George L'Heureux. I'm a principal consultant for data strategy here in the advisory services team at Dun & Bradstreet. In advisory services, our team is dedicated to helping our clients to maximize the value of their relationship with Dun & Bradstreet through expert advice and consultation.
On Data Talks, I chat every episode with one of the expert advisors at Dun & Bradstreet about a topic that can help consumers of our data and services to get more value. Today's guest expert is Liz Walters. Liz is a data advisor at Dun & Bradstreet, and Liz, how long have you been with the company?
Liz Walters:
I have been with the company over 20 years. So a lot of time spent working with our customers in exactly this capacity.
George L'Heureux:
And can you tell me a little bit more about what it is that you do in this role?
Liz Walters:
Well, we help our customers make the data actionable, make it useful, and where a lot of that comes into is in order for our customers to get the anticipated value so they can do the scores, the metrics, the analytics, the starting point is getting their data to map to our D-U-N-S Number, because that facilitates linking our data to their data and they can do all those other wonderful things. And so that process we refer to as matching. And so a lot of that 20 years has really been spent in helping our customers match. How do they find the right D-U-N-S Numbers for their business universe?
George L'Heureux:
Well, I think we should switch seats. What a great segue, Liz, into the topic that we wanted to talk about today, which is match accuracy, confidence, and how do customers improve their confidence in the matches that they're getting back from Dun & Bradstreet. Other experts that we're having on the show have talked about identity resolution match, so let's talk about match confidence. What is it? What does that mean?
Liz Walters:
Match confidence is D&B's way of expressing how confident we are that we found the match that you were looking for. Our match process doesn't deliver a yes or no answer. Yes, it matched. No, it didn't. The process is designed to identify what we believe to be the best candidate, and we express why we feel that it's the best candidate to you through a certain something called a match string. It's got a series of 11 different data elements with grades. And to make that more useful to our customers, we translate that into a confidence code. So how strongly do we feel that we've found the correct match for your input criteria?
George L'Heureux:
And so why is that match confidence important? Why are those grades important? What do they do for a customer?
Liz Walters:
Well, it really helps hone in and understand that is their match project going the direction that they want to. To give an example and take it out of business world, say that you're looking for your high school friend, Liz Walters, who you know lives in California. If you look for Liz Walters in California, you will find, I think there's about 10 Liz Walters in California, and one of those is probably the Liz that you're looking for, but you really don't know for sure. You have no high degree of confidence how to move on with the data. Which one of the 10 do you focus on? What do you do next?
And so that result will be expressed to you, "Well, here's the Liz Walters we found with a low degree of confidence," and then you can decide what additional work do you want to put into to figure out is this the Liz that I'm looking for versus if you had a Liz Walters in California and you had an exact match on a phone number, well, now you've got a high degree of confidence that you found the right Liz Walters, and then you can now proceed with your next step with Liz Walters.
George L'Heureux:
And that's what we basically refer to as match stewardship, where that confidence level allows you to take certain steps automatically, or maybe they fall into another bucket where you have to do a few more manual steps to confirm that it is who you think it is. Maybe you call all those Liz Walters up and ask if they went to the same high school as you. Right?
Liz Walters:
Exactly. So it really helps. So when you've got your match criteria, it helps group those inputs. So, okay. These you can do an automatic action on, some you will want to take some manual intervention with, but the match process will help you narrow down on that bucket of who you need to spend some manual time with.
George L'Heureux:
So when we think about how we improve match confidence and try and get more up into that sort of automatic action step, what are some of the recommendations that we end up making to a lot of our customers that are useful to them in terms of improvement?
Liz Walters:
Well, a very important component of any match criteria is input data quality. What data are we getting from the customer in order to match? And it's always worth a little bit of time to go through that data to make sure that you're giving high quality data to D&B.
Just as an example, we did a match once for a large manufacturer and the results came back that they said, "D&B, you clearly don't know what you're doing. You matched a third of our file to donut shops. We don't do business with donut shops." And it turned out that what was happening was that the customer was writing, "Do not use," in the customer field for records that they meant to retire, and those hadn't been cleansed out of the input file. And so, “do not use” looks a lot like “donut”, so based on other criteria, there were matches to donut shops.
So things like that, customers don't always assume ahead of time that there may be things in the input file that they really don't want to give to D&B to begin with. So if you spend a few minutes looking at the input data, am I giving data that I really ought to be giving to D&B for as part of this match process?
George L'Heureux:
Well, great. So now not only do I want donuts, but I'm going to be completely unable to focus on the rest of this. So when you get a file, is that something that you're looking for? Do you go through and try and see whether or not a customer has sent things like that? Is it something that you call out to them?
Liz Walters:
Yes, absolutely. We'll go through, and it generally doesn't take too much time, but it's good just to kind of sift through, I kind of call it playing the data, letting it flow through my fingers, just to catch anything odd. So one of the big things that we would look for are big blanks of blank data. The less information that we get to help us identify the customer, the less confident match we'll have.
So going back to the Liz Walters, if you're looking for Liz Walters in California without an address and a city, it just makes it that much harder to come up with a strong decision. So if we're getting a customer file and they're missing a lot of street addresses or missing a lot of cities, that's something to call out that says we can still attempt to make a match based on what you're giving us, but be aware that that reduces our ability to give you a high confidence match.
George L'Heureux:
That can't be the only thing that we're looking for, though.
Liz Walters:
Oh, no.
George L'Heureux:
I mean, blanks are obviously going to be a problem. What else are you looking for?
Liz Walters:
Another thing is things like test data. Something that I like to do is do a frequency count of the various input fields, actually both the frequency count and an alphabetic sort, because both of those ways of looking at the data has a tendency to float weirdness to the top. I hesitate to say bad data, because almost always that information is useful for some purposes. So going back to our donut shop, the do not use was useful information for that customer, for their purpose. It just didn't help us with the match.
But if you do the frequency counts and the alpha sorts, weird things will tend to float to the top and you can see, "Wow, I really didn't recognize that I'm giving so many numeric values to D&B." Maybe that's a contract number instead of a business name, or if you do have a habit of doing ... We see things like “test”, “do not use”, “internal use only”. Words like those will float up to the top and then you can go, "Oh, wait a second. Maybe I need to take a second look at how I pulled this data to give it to D&B."
George L'Heureux:
I think it's a really good distinction that you're drawing there that it's not necessarily bad data. It's just maybe data that while it's useful, it's in the wrong spot or lacking context. It could be confusing. I know that I've made suggestions, "Hey, let's move this do not use indicator into its own field," or, "Let's move this over into a different place."
Another thing that I see a lot of the time along those lines is “attention” or “care of”. When I see “attention” or “care of”, that indicates to me we've potentially got two different companies here, and that can really get confusing.
Liz Walters:
Yes. When information is conflated, that can be very difficult for D&B to work with because we don't know, well, is their address, is that for the company that you're really doing business with or is that just where that company happens to be located for this engagement? And that's not going to help us find the business that you're looking for in our database.
George L'Heureux:
So all of these tactics that we're talking about, the searching for blanks, looking through frequency counts to checking for “do not use” so that you don't get all donut shops, these are all tactics that are part of a broader strategy. Is there a way that you define what that overall broader strategy is?
Liz Walters:
Do you mean for the customer?
George L'Heureux:
Either for the customer or for what you're doing on behalf of the customer. What are we really trying to get to in doing all of that?
Liz Walters:
Well, in part, it really would be towards match optimization. How can we optimize your match results so that you get the most useful data at the end of the match process?
George L'Heureux:
When we do this type of thing, have you identified ... I guess I should ask rather we're able to quantify how much better those match results look for a customer before they take these types of data quality actions versus afterwards, aren't we?
Liz Walters:
Yes. So you can compare. You can isolate some of these, the bad stuff, stuff that maybe you shouldn't have given to D&B, and compare and show that if you hadn't given that to us, your overall match rate would have been X percentage higher and just impact of wow, what is happening if you're giving us information that we can't use to match.
George L'Heureux:
The other thing that strikes me about the strategy is that it results in broad categories of data weirdness, to use your word, that we can address with simple solutions. If you've got “do not use”, then the solution is to strip “do not use” from that field. If you've got tests all over the place in the field, you're removing the word test or dealing with blanks in a similar fashion. So hopefully you can take the biggest issues and start to eliminate them from having a quality impact on your match. Is that right?
Liz Walters:
Definitely. And it can highlight for customers if there are patterns that they really weren't aware of, they can take separate steps. Another pattern you might see if you see a lot of personal names, if people are putting the wrong ... Instead of putting the business name, they're putting their contact name, maybe another field needs to be entered into the input so that you can specify, okay, here's the business name versus the person that I tend to talk to at that company. So it helps open up the customer's eyes to different ways they may want to organize their own data.
George L'Heureux:
So customers can maybe recreate or replicate some of what you're doing by taking some of the steps that you've identified here. If they do detect problems, are there steps that they should take before they send us the data to match?
Liz Walters:
Yes, ideally ... Well, again, it depends on the nature of the customer. If there's things like if there's ... It's a lot of test data that you want to strip out and don't give to us, because that's just going to muddy the waters in the results. If they're finding things like there's a lot of conflated information, maybe, well, create another record or clean up that record so that it isolates the one that they're really looking for. If they are interested in finding information on both those companies, create a separate record where they clearly identify company one versus company two.
George L'Heureux:
Right. And the other thing, and oh, gosh, this is always an issue and you want to try and make sure that you're finding the right balance here, but do you have any recommendations on how to help customers recognize when maybe the juice is no longer worth the squeeze. Where the work that they're putting into trying to improve their data for match purposes, it's just costing more than they're going to get out of it.
Liz Walters:
Yes. It's hard to quantify. You can't say, "When X happens," but when you realize you're spending a lot of time on a file and you're not finding any results, you do realize there's a threshold where it is time to move on. So you could, if the customer has a value indicator where they, how valuable are these customers, you may get to a point where I'm identified, I'm able to resolve all of the customers that are truly important to me. Maybe it's just not worth it to me to spend additional time on this lower subset that I am not interested in. Or a lead versus a prospect. Maybe you don't want to spend as much time on leads because you know that if you convert that lead, that lead is going to go through an internal process that will clean it up, that will help you match it. So it just may not be worth your time to try to clean up the lead before it becomes a customer.
George L'Heureux:
That's a great point, that your process is going to be dependent on your particular situation within the company and what other resources you have at your disposal.
Liz Walters:
Yes.
George L'Heureux:
So how can Dun & Bradstreet help, as we wrap up here, Liz? You're doing this every day. Customers are listening to this or watching this. How can you help them to really get the most out of their match?
Liz Walters:
Make sure that you're giving us as much information as you can that helps us with the match. And also those value indicators of things that will help us understand how important that record is to you. That can help us convey back to you how much effort you want to spend into resolving the other records. Things like age of the account, total revenue size. If you have an eternal anticipated future value metric. Things like that won't help us resolve the match, but can help us stratify the results for you so that you know where the customer wants to hone in and spend their time on the match file.
George L'Heureux:
And that all comes back to that whole cost versus benefit calculation, making sure you're spending time on things that are actually going to be valuable for you.
Liz Walters:
Exactly.
George L'Heureux:
Yeah. Well, thanks a lot, Liz. I really appreciate you taking some time today to sit down and share your expertise on this topic with me and everyone who's watching or listening.
Liz Walters:
You are very welcome. Thank you.
George L'Heureux:
Our guest expert today has been Liz Walters. She's a data advisor at Dun & Bradstreet, and this has been Data Talks. I really hope that you've enjoyed today's discussion. And if you have, I encourage you to please share it. Let a friend or colleague know about the show. And for more information about what we discussed on today's episode, please visit www.dnb.com or reach out to your company's Dun & Bradstreet specialist today. I'm George L'Heureux. Thanks for joining us. Until next time.