Data Talks, Episode 4: Why Match Accuracy Is Critical

Episode Four: Why Match Accuracy Is Critical

Host: George L'Heureux, Principal Consultant, Data Strategy
Guest: Tim Petre, Data Strategy Consultant

“Match starts everything. We’ve got to make sure that we’re providing the customer with the best record that we have, and it’s working with the customer to ensure that they’re providing the highest quality input data.”

Matching, also known as Identity Resolution, is connecting your organization’s data to a trusted set of commercial reference data. This is done to help cleanse your records of faulty data (for example, a typo or wrong information) or to append additional information to a record.

The matching process does its best to ensure that the right information is associated with the right business so that an organization can strengthen confidence in its decision making. This can lead to more operational efficiencies, increased sales, reduced risk and other business use cases.

In this episode of Data Talks, Dun & Bradstreet Data Strategy Consultant Tim Petre discusses the basics of matching, and how we help our customers understand the connection between their data and Dun & Bradstreet data, which includes information on more than 400 million global businesses in our Data Cloud.

 

Read full transcript

Episode 4: Why Match Accuracy Is Critical

George L'Heureux:
Hello, everyone. This is Data Talks, presented by Dun & Bradstreet. I'm your host George L'Heureux. I'm a Principal Consultant here on the Advisory Services Team within Dun & Bradstreet. In advisory services, our team is dedicated to helping our clients maximize the value of the relationship with D&B through expert advice and consultation. On Data Talks, I'm going to chat every episode with one of the expert advisors at D&B about a topic that can help consumers of our data and services to get more value. And today's guest expert is Tim Petre. Tim is a Data Strategy Consultant at D&B. Tim, how long have you been with Dun & Bradstreet?

Tim Petre:
43 years.

George L'Heureux:
My goodness. Tell me a little bit about what you do in your current role as a data strategy consultant.

Tim Petre:
Okay. Basically, I'm in the financial services vertical, specifically with insurance. I work directly with our sales representatives and directly with the customers, understanding what their use cases are and supporting them through education, working through specifically match files, hierarchies, just explaining our data and how they best can use it to support their business.

George L'Heureux:
And how did you come into this role at D&B over the course of your career within the company? What were the steps that kind of led you to this role over time?

Tim Petre:
I've been doing what I've been doing for about 10 years now. And most recently, this last seven has been specifically with this group. I started out as a DUNSRight™ consultant explaining our data and DUNSRight process, which is the patented process from our global data collection all the way through analytics. So I've been pretty much in this role, like I said, for the last 10 years.

George L'Heureux:
So you and I had wanted to chat a little bit today about matching, which really is one of the core competencies that we have at Dun & Bradstreet. So, let's start with the basics. What exactly is matching?

Tim Petre:
Matching basically is what we call our identity resolution. The matching process is patented with Dun & Bradstreet, basically what it does, it takes our customers data and matches it to the best record that we have in our file. And it's based basically on what the customer is providing us as data input. And then based on that, we'll compare it to our match reference file and provide them back either a single record that matches best to what they provided or to a candidate list where they can choose from.

George L'Heureux:
And those candidate lists or that best record, that includes the D-U-N-S Number, which really is kind of the key to all the information that's inside of the Dun & Bradstreet Data Cloud. Right?

Tim Petre:
That is correct. Yes.

George L'Heureux:
So in a way, matching is kind of how it all begins for most of our customers. And I would imagine that accuracy with that match is really going to be critical.

Tim Petre:
It's absolutely critical. Basically, the customers coming to us and depending on what their use case is, whether it's for marketing or for trade credit, they have specific reasons why they want to have that record matched to a D-U-N-S Number. And it's just not providing a D-U-N-S Number. It's providing all the other ancillary information that we provide back with that. So, it's public filings on the business. Is there trade references, how they're paying their bills, who are the owners of the business, what are their operations? So it's all that additional information that we're able to provide through matching. And we have to ensure that we're providing back the record they're specifically looking for.

George L'Heureux:
With that being the case, with our customers relying on us, to help them understand that connection between their data and our data, how can we, and how do we make our customers rather feel comfortable that those matches are accurate?

Tim Petre:
Once we match a record, we provide back additional information on that record. So we provide back the confidence code is how confident are we in that match. It goes from a zero, which we don't have anything in our file that comes close to even matching to their input data all the way up to a 10, which is saying that this is definitely the record that you want. So we would write a confidence code. We provide Match Grade strings, which explains their input data compared to our reference data. How do they compare to each other? Are they the same? Are they similar? Do they match at all? Or you didn't provide the information, but we have it or vice versa.

Tim Petre:
And then finally, we had provide back when we call them match data profile codes. And that explains, for example, you provided ABC company. We gave you back as the Dun & Bradstreet record XYZ company, and explains saying that that's a match for example. You provided us for example, a trade style, and we're providing you back the primary business name with the legal business name on the record. So it's giving additional information on why we made the match we've made.

George L'Heureux:
And I have to imagine that if we're going to ask our customers to use, not only these matches that we give them, but also the tools that we give them for evaluating the matches that they get back. This is something that we've tested a lot. And I know we drink our own champagne to use a phrase by using this internally. Can you explain a little bit about how we use that same process internally to improve our data cloud?

Tim Petre:
Correct. Okay. Yes, the exact same matching logic that we use for our customers we use internally. So for example, if we have bankruptcy information, we have to ensure that we're putting that information on the right record. So we're actually using our own tools. Our confidence codes are match grade strings, our MDP codes, to ensure that we're identifying the right record, that we're applying, not only bankruptcies, but suits, liens, judgments against the business, any type of derogatory information, we're ensuring that we're providing an on the right D-U-N-S Number and the right record within our file.

George L'Heureux:
So you kind of talked about it a little bit when you were discussing those Match Grade strings, and that ability to not quite get an exact match. So this system of matching that we have at Dun & Bradstreet, I know it allows for some degree of uncertainty, right? Can you explain how that fuzzy matching works?

Tim Petre:
Yes. So what we do within our match grade strings, is we have values that we assign to the comparison of the input data to what's in our match reference file. And we have the ability to say whether we think it's an exact match, or if it's a similar match, for example. So if you came in as ABC company and you only want it, and we were only able to provide ABC company back to you, that would be a very limited type of match. We do standardization normalization to our files. And what we do is not only do we do it to our own internal file to add information, to ensure that it's appended to the right D-U-N-S Number, but we also do that same logic to the customer's input file. So we're comparing apples to apples.

George L'Heureux:
So, I think that that's a really important point, right? I mean, if we have something in our own file and, to use your example, it's ABC corporation. If we do something like we take out the periods and the extra spaces and the extra hyphens, and commas, and things like that in the names that we're storing in our database, we're applying those same types of transformations to the data that our customers provide us and ask us to match on.

Tim Petre:
Right. That's absolutely correct. Yes. We want them to ensure that we're doing the same standardization normalization process to the both sides of the equation, to the input record and to what we have in our file.

George L'Heureux:
So with the size of our reference file, what are we doing to make sure that we're staying up to date that our data are staying right, and that our customers are always going to be able to match against updated data? We have lots of processes in place to make sure that that stays good. Are you able to talk about that just a little bit?

Tim Petre:
Yeah. I mean, that's outside of the matching, that's more of our global data collection, but yes, we have over 30,000+ sources of data that we use. That's constantly being updated. That information, like I said, we use the same type of matching logic to take that information and ensure that it's matched to the correct record within our database. That's constantly refreshing our businesses. We have over 5 million updates that we do a day to our global database of over 400 million records.

George L'Heureux:
So, that's a little bit about what we're doing to make sure that we're staying good on our side. But as you mentioned before, this is kind of a two way street, and it depends on what the input data from the customer looks like too. What can customers do to make sure that they're getting the best results they can from a matching exercise with Ben and Bradstreet?

Tim Petre:
That's an excellent question. Basically, what we request from the customer is to provide a, a temporary file that we're able to take a look at and make recommendations up front, that we have the ability to identify areas of opportunity within the file. For example, do they have test records that they're providing, which they really shouldn't be as part of our normal processing with, of the match file? How do they look? Do they have any special characters? Are they providing the actual business name? Or are they abbreviating? We would request that they spell out the business name, the full business name, are they providing addresses, or they're providing the country. If it's a global record information like that, we're able to help them understand how they can improve the quality of the data.

George L'Heureux:
And you and I have even talked, and something as simple as including a phone number can improve the match results, improve the quality of the match results by a significant amount.

Tim Petre:
Yeah. Not only a phone number that can actually raise the potential of a match anywhere from five to 10% in some cases. But we also ask that once we do our match, many times, the customer thinks this is the best file that they have. And there could be something like the actual address is a remittance address as opposed to the physical address of the business. So within the company, that different areas within a company they may have where they do have multiple files, one that would potentially have the business name with the physical address. One with the remittance address, we asked them to provide both basically. So we can do a test and learn to understand which file is the better one to use for the overall match.

George L'Heureux:
It sounds like there's always something else that we can do to try and gain a little bit more optimization out of that match process. But Tim, as we wrap up today, what would be one thing that you'd want listeners or people viewing this today to walk away from this discussion about identity resolution and match?

Tim Petre:
Well, match starts everything. If you've got to make sure that we're providing the customer with the best record that we have, and it's working the customers to ensure that they're providing the highest quality input data, because that'll ensure that we're providing the correct record that they're looking on in the backend.

George L'Heureux:
Well, Tim, Hey, I really appreciate you taking time today to meet with me, talk a little bit about the match process, how D&B helps it, how our customers can make changes to improve things themselves and really sharing your experience of over four decades at Dun & Bradstreet with everyone watching, listening today.

Tim Petre:
Thank you for having me.

George L'Heureux:
Our guest expert today, again has been Tim Petre, a data strategy consultant here at Dun & Bradstreet. And this has been data talks. We hope you've enjoyed today's episode. If you have, we encourage you to please share it with friends or colleagues, let them know about the show. And for more information about what we've discussed on today's episode, visit www.dnb.com or talk to your company's D&B specialist today. I'm George L'Heureux. Thanks for joining us. Until next time.