The Power of Data Podcast
Episode 31: The Spirit Guide to Data
Guest: Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet
Interviewer: Sam Tidswell-Norrish, International CMO, Dun & Bradstreet
Welcome back. You're joined by me Sam today and I have the huge pleasure of having Anthony Scriffignano, Chief Data Scientist of Dun and Bradstreet with me. Welcome, Anthony.
Good morning, Sam. Great to be here.
Look at that grin, if only this was being televised. We're gonna go off piece today. I've got in front of me a list of very kindly pre-prepared questions. We're not going to look at any of them. And we're going to first off talk about what happened last night. We were at Qualcomm, which was an event hosted by a partner of ours Quantexa, celebrating the power of data analytics and some of the work that they do with their partners. We are a partner of theirs. And you gave an incredible keynote speech. You're now going to have to take the 25 minutes that you presented, which by the way was 10 minutes over, condense it into one minute, what did you say?
Well, we were talking about anomalies. And we were talking about how easy it is with all of the data that we are crushed with now in business to just wave our hands and point a tool at something and say, oh, okay, I'm gonna go find the anomalies. Without questioning what does that really mean? What would it look like? How do I define that? So the types of anomalies that we deal with in business are things like fraud, malfeasance, identity theft, trade rings, so forth. And then you get these second and third order effects like money laundering and human trafficking. Describing these things is one thing, as human beings, we understand them intuitively. But trying to find them in data is a massively complex problem that's at the edge of computer science. So we were trying to talk about the challenges of navigating that boundary.
You used a bicycle analogy, which I really liked. So you're gonna have to give a little summary of that, too.
So it's a quick story that I sometimes used to introduce the concept of analytics, and the kinds of analytics that we do in our human brain that we just take for granted. And then the difficulty of automating those sorts of things. It's a story you can imagine it, you can really place it anywhere you want. In my mind, I place it in Eastern Europe somewhere during the Second World War. A lot of changes in borders, a lot of “show me your papers” kind of behavior. There's a guard sitting at a border crossing and a guy comes down the road on a bicycle. And he's got this very large sack on the back of the bicycle. He's kind of struggling under the weight. So the guard stops him and he says, “what's in the bag?”, and the guy says it's a bag of sand, and the guard and shoulders, his weapon and he stares into the eyes of the guy on the bicycle. And he says, “I'm not going to ask you again. Tell me the truth. What's in the bag.” And the guy says, “I promise you, I swear it's a bag of sand”. So he has an open the bag, he stabs around it with his bayonet, he gets a dog to sniff it as a metal detector, can't find anything. He looks the guy in the eye and he says, “I'm memorizing your face. You might be getting away with this today, but I will never forget your face.” So every couple of weeks, maybe three weeks, two weeks, same guy on a bicycle, sometimes a large bag, sometimes a small bag, same experience over and over again. Sometimes the guard has him open the bag, sometimes he doesn't. One time he actually opened the bag and spread the sandal out on the ground so we could completely look at it and inspect the bag, and then he had to help the guy scoop the sand back up and put it back in the bag and tie it up and put it on the bicycle. So this goes on, and then eventually the borders change. The war ends, they don't see each other anymore. Couple years later, the former guard is sitting in a bar somewhere. And incomes the guy that used to have the bags of sand so their eyes meet. And the former guard calls him over. He says, “come on, sit down with friends. Now the words over introduces himself. Let me buy you a beer.” They talk a little while. And finally the former guard smashes his beer down on the table and he says “I have to know I can't sleep at night. It's driving me nearly insane. What were you smuggling” and the guy said “bicycles”.
So good. I love that. And that wasn't the only humorous part of last night's speech. However, the crux of this is not humorous. The crux of this is around opportunity. Yeah. There's more data being created every day. It's exponential. But it's hard to quantify. You're the man is going to quantify it for us. Well, it's about the V's.
Yeah, I mean, this is the challenge. It's not even exponential. It's possibly hybrid. metric x to the x. It's growing at a rate that arguably can't be measured anymore. So there's all kinds of articles. In fact that presentation yesterday started with a someone had a slide, he was how many zettabytes of data there are maybe. All of those things are estimates based on the internet based on looking at how much storage space is being made all kinds of ways of trying to estimate how much data there is on earth. The reality is that now we live in a world where data begets data. When you use data, you produce data, or devices produce data, not all of that data is shared. It's sitting in your Fitbit, it's sitting in your mobile phone in the diagnostic things that you don't share, because you click that box. All of these types of things are part of I'll call it the data sphere. So trying to understand all of this data simultaneously involves dealing with these V's of big data: volume, velocity, veracity variety, value. Those are all of the aspects of big data that make it big. When any of those things overwhelm the best of technology to deal with them, now you'll have a big data problem. So it doesn't just mean having a lot of data, it might mean that it's changing faster than you can analyze it. It might mean that it's not all true, it might mean that it's not all simultaneously true. So you want to take this data and put it in your data lake, but it's changing. The minute you put it in your data lake, you kill it, you take data in motion, and you create data at rest. That's kind of a problem for certain types of analytics. So we have to get smarter about understanding the nature at the stop worrying about how many bytes of data there are, we have to start worrying about, what is this thing you call data? And what is this thing of which you speak when you say analytics? And what does it really mean? We have to understand bias, we have to understand the character and quality of the data, the applicability of that data to the question we're trying to ask, we have to be more scientific about it..
So originally, before you just said, we've said – I would have asked how we can control it, how can we wrap our arms around the data, but I don't think that's what we need to be thinking about. We need to be thinking about how we let data grow. How do we treat it as an organism? What technologies can we apply to help us do that best?
So, the very first and most important thing is not a technology. It's a mindset. We have to be critical thinkers. We have to ask better questions. What would you have to believe in order to use this data to answer the question you're trying to ask? There's subtle biases that creep in, because we haven't learned how to ask good questions. When you do a dissertation, you often spend months trying to get that question, right? You want to jump into the data and start studying something. And your advisors saying, “Oh, hold on, we still don't have the right question here.” So we of course, we don't have months in the business world to do that sort of thing. But if I ask a question like, “to what extent does the shipping data that we can see in and out of this region help us understand the impact of this political tension?” That's a good question because it says to what extent if I said, “how much of the impact can we see?” I'm inferring that I can see it at all in that data. And that subtle difference between to what extent and how much drives me down a road of something called an a priori conclusion. I've assumed in advance that the answer is in there, the answer isn't always in there. I said last night that we have to move from a mindset of looking for needles in haystacks looking for needles in stacks of needles. Everything has value if you understand how to use it.
Speaking as a CMO and a marketer, one of the things I like most about our brand at Dun and Bradstreet is actually the ampersand, the and sign it joins a D and a B, it’s very clever. You always talk about data and tell us more.
It makes me a little crazy when someone calls me a data provider. Or when somebody starts a sentence with something like well, “we'll just take your data, and then we'll put it over here and we'll do something with it. And we'll come back and let you know when we need more data.” You've taken me out of the equation where I can actually help you as a spirit guide to what you might assume is in that data that may be different. As an example, you're collecting data from all over the world. Why would you assume that the whole world looks at the ontology the same way the number of employees, the, the turnover? Whether or not the business is in business, there's different types of bankruptcy around the world. There's different types of ways of counting employees, there's different requirements for things like ultimate beneficial ownership. So if I just give you the data, and you look at the what's called the ontology, the data dictionary that says this field contains something called owner, you're going to decide what you think owner means. And you're going to go off and you're going to take you’re a priori conclusion, and you're going to look at this massive corpus of data. And maybe you'll still get to a good decision. But I can probably tell you the first three mistakes you're going to make, so why don't you let me help you make a new mistake instead of one that's already been made. So this concept of data and is to say, data and insights, data and the information that can be gleaned from it. Data and the inference we draw, data and the bias that's implied by the environment that we work in. Once we think about how we're connecting the data to a problem, a question a methodology, it opens up a whole new way of thinking about that data, and it becomes alive.
So you're talking about ways of thinking a lot, okay? And that's really resonating for me, I think about just in my relatively short career, how the working world has changed, and how culture has been affected ultimately, by technology, by changing behavioral trends ,by different business models. But if it's about mindset and not about technology, let's for a moment, imagine one of our listeners is the CEO of a midsize firm, 500 people preparing to future proof itself. That's probably a phrase you hate. And they want to start thinking about how to embrace data and analytics for us. We have swirls of data scientists, it's in our blood. We are experts at this house. As a 500 person company, begin to change mindset and build operations to best be placed with us.
So now you're talking about change leadership, which is exactly the right frame. The funny thing about the future is that it's always in the future. So if you try to future proof, it's a great job. You know, I'm going to be a future proof here because the future never comes. So I always have work to do, right? If you think about where you're trying to lead your organization, and why are you trying to lead it there? And why do you think that what you're doing as a leader to intervene actually connects those two things that's called reflective leadership. That's a really powerful thing. The drivers have changed leadership or people, process, technology and mindset. Of course, the technology is important, but the people and the process and the mindset are equally important. So for this midsize company, what I would do is I would sit and take inventory of what are my strengths? What are my weaknesses, the SWOT sort of thing. What am I trying to do and maybe technology is the answer. Maybe partnering with someone who has that technology? Or that capability is the answer, maybe some sort of collaboration, often with midsize companies. It's very difficult to do anything at any kind of scale without some sort of collaboration. But a very high percentage of all strategic alliances fail, ultimately, because they didn't do the homework around these fundamentals. They get immediately focused on ROI, they get immediately focused on trying to win more than the other guy, that's not a partnership. That's a sort of a frenemy, that you're doing business with right now. So midsize firm, the pressure is even more on them. Now with everything connected to everything. And so much more data available. Let's say they make bicycles or let's say they package sand, right, go do that, and you be you. And then the technology part, the data part might be something that you are collaborating with a practitioner or an organization or an enterprise, that that's what they do if as you grow as you move into that future. You will gonna start hiring some of your own data shamans, you know, whatever you want to call them, I'm not really big on titles, then great, go do that. But if you do that, first, you probably just gonna have a bunch of dissatisfied people that leave in a year because they weren't challenged.
So we've talked about the mindset shift or change leadership, which was a much more academic way of saying it. We're going down the bicycle route. So let's stick with it. Okay, I'm a bicycle vendor. 500 person company. Dun and Bradstreet does stuff on the defensive and offensive side. Give me some use cases, how can data help me sell bicycles and help me do what I do?
Are you trying to sell more bicycles into your existing market? Or are you trying to find new markets to sell your bicycles?
Bit of both.
And are you trying to sell today's bicycles or the bicycles that you're building for the future?
A bit of both.
Okay. So you have this sort of middle of the road strategy. I want it all and I want it now.
I'm not in bicycles, don't blame my strategy.
So no, that's fine. This is this is how I see it. I'm confirming my observation with you. So the first thing I would do is, I would say “tell me a little bit about your customers”. And you probably would say, “well, I have great customer relationships and so forth”. And then I would try to tease out, do you have this many – nobody can see this. I'm just doing the big fish, little fish. – “Do you have this many customers? Or this many customers? Where do those customers tend to be? Are they geographically dispersed? Are you selling to other people who in larger quantities who are then selling your bicycles? Are you selling to the individual consumer?” Once I understand your customers, and then I want to know a little bit about your vendors, I'm going to take the information that I have, and I'm going to start to look at your customers, vendors and your vendors, customers and your customers, vendors customers. That's something I call the integrated value chain. I'm going to look at where you sit in that ecosystem. And then I'm going to propose that we look at that according to different scenarios. You might want to grow your business by finding more customers that look like your existing best customers. That's a good growth strategy. If you're not well penetrated, if you are well penetrated that's probably we're not going to get you much. So now you want to find new customers that could be better than your current best customers in some new way, in some new type of market. And that may be a market that's white space that no one's in. Or that might be a market that is saturated by competitors. So if it's saturated by competitors, now I want to try to look at their integrated value chains, and how you can enter that market either in a collaborative or a competitive way. So it's everything connected to everything. I don't want to just say, “here's a bunch of data. Here's a bunch of companies that buy bicycles, why don't you use these as sales leads to see if you could sell bicycles you might hit you know, you might get lucky”, but you won't necessarily know why. And you won't be able to adjust your strategy as you go forward because you don't have a strategy. spinning the wheel and hoping you get lucky is not a strategy.
Yep, hope is not a strategy quite right. Okay. So providing intelligence on integrated value chains, creating linkage, showing relationships and understanding how to make better decisions.
All that. How hard could it be?
Easy. Well, easy for you. Not for, I haven’t sold a bicycle.
Not easy for anyone. But that's the reality. In a digitally disrupted world, everybody can see everything. Your biggest competitor could be a really small guy that you don't even see coming. And that happens a lot. Especially with midsize companies. They get disrupted by somebody, they're worrying about the big guys in front of them. And it's the little guy that they didn't see coming. There's a, there's a terrible joke. A lot of times I'll use this humor because it helps you anchor on the concept, that if you're in a campsite, and a bear comes into the campsite, you don't have to outrun the bear, you have to outrun the slower campers, right. So if you want to compete as a midsized company, which most companies are either small or midsize, we think of all the big companies when we think of companies, the vast majority of all companies in the world are smaller. If you want to compete in that kind of an ecosystem, the advantage goes to the informed the advantage goes to those who understand why they're doing what they're doing, and do it on purpose because then if it's not working, they're more easily able to change what the doing because they know what they're doing.
Exactly right knowledge is power. I going to stop using short cliché lines and I want to ask you a little bit about your strategy. What are some of the things that you're working on at Dun and Bradstreet that are leveraging new technologies, new concepts to help our customers?
So the big buzzword these days is AI. AI is not new. It's been around for a long time. But now we're at a convergence where there's enough data and enough compute power to start doing some of these things that were conceived of, in many cases, many years or even decades ago. So we're definitely looking at AI methods that are both supervised and unsupervised. So based on the past, and also based on just looking at the present data and coalescing it in different ways. We look at reinforcement methods, which are we have different techniques that sort of compete with each other to find the best answer. In my group we do a lot of work on problems like fraud and anomalies where it's very hard to model because the thing you're looking for the bad guy, for example, will change their behavior when they think they're being watched. So if you model your modeling how the best ones are no longer behaving. So we have to use something called a cognitive approach where we try to converge on advice that you, the expert are more likely to take next time without necessarily understanding why you're doing what you're doing. So it's this whole bevy of that, at the same time, identity resolution, linkage, all of those things that you sort of rattled off just now, those are our core competencies. And we need to continue to advance those capabilities, understanding the changing impact of language, the changing impact of digital business identity, things like that. So we need to continue to advance on the front of identity resolution, and connecting things, linkage. We also need to better understand the digital world. So understanding signals, and the veracity adjudication of you know, not everything is true, right? There's a lot of disruption of information. So can we find information that's been manipulated or disrupted? Those are some of the areas where we're innovating right now.
That's awesome. Thank you. Anyone who wants to know more should get in touch with you.
I look forward to that.
That goes without saying, open invite to all of our listeners. And it's a call to action. If you're sitting there and you're listening to this, and you don't know what your data strategy is, and you don't know how you're going to future proof your organization, then you need to have a conversation with Anthony. I want to switch gears a little bit. And I want to talk about the fear of AI. And ultimately AI is just maths. It’s increasingly complicated maths. In England with a maths not math. I've never understood that. It's an interesting one.
I guess there's more than one math.
There we go. Yeah, our math is better than yours.
I don't think that word means what you think it means.
This is true. Let's go back to AI. There's an inherent fear amongst people in society because largely Hollywood has told us that the iRobot AI type model is inevitable. Elon Musk has put an injection of fear into everyone. What's your view on the ethical side of AI and how much do we have to fear?
Well, with great Respect for Stephen Hawking and Elon Musk and some really smart people who have clearly shaped some of the dialogue here. Yes, these fears are real, these risks are real. What happens to us will ultimately be determined by how much we give up to our digital overlords. I say I'll never report to a robot, Outlook tells me to go to a meeting and I go to the meeting, right? So to some extent, I'm already taking direction from a robot. And that's okay, as long as I reserve the right to sort of usurp that direction and do something else. We fly on airplanes that have auto flight systems that are highly automated. Airplanes need to stay at a relative altitude vertically, so they don't interfere with each other and the ground below them is moving up and down. That's a really tricky problem, right? Yes, please use computers to help you do that better. Please do that. But at the same time, we cross a line and this is where the ethical part of it starts to come in. We cross a line where we've given up something by accident that we didn't realize we gave up. So one of the scenarios is we need a computer to make a decision very fast that you the human can't make in real time fast enough. Robotic surgery which vessel that we cauterize, the nuclear wars scenario where we have to decide whether or not this is real and are we going to watch, the situation with this self-driving car is certainly going to have an accident. And it has to choose which accident. There's all these different scenarios that we play out in the future. And we say, you know, all of those scenarios, you can't make a decision in, you know, 43 milliseconds actually can to some extent, but you can't make a decision fast enough to really impact all of this information. And so we're going to let a digital agent do it for you. We're going to give up that right. And what will ultimately take us down is our complacency if we've given up too much because it was easier. If we give it up because we have to pay attention to it maybe it's still inevitable, but maybe we can slow it down by 100 or two.
I want to jump on that word complacency ok. You gave an analogy, an airplane analogy. It's a good one, we'll go with that one. Think about software upgrades. We had a perfect example of AI and software letting us down, with the max jet software upgrade. But there was a degree of complacency with the software issue and also with the disk configuration of where the engine set, which was that there wasn't enough training for the human beings, which was a serious complacency. I guess ultimately, the question really is, what's the role of the human being? And what are our unique assets in a world of AI?
So I'm not a pilot, but I fly with them a lot. And I talk to them a lot and they listen to podcasts by pilots. So it's like saying a slept in a Holiday Inn at one point is a commercial that says I'm not a surgeon, but I slept it. My understanding of that very complex problem that you're talking about is that it wasn't one root cause. The plane is longer, we need different answers, we want the plane to feel like the other planes that the pilots are able to fly with the same certificate so that we don't need a special class of pilots to only fly this one plane. So let's put this system in, it makes it feel like the other plane. And then all of a sudden, some unusual condition occurs where there's two sensors that disagree with each other. And there's an optional indicator that you didn't pay for that might be on the display or might not. And there's these two little switches that you have to flip off to disable this thing that's not really helping you anymore. But you didn't know that because you either didn't get the training. And so there's a concept in risk management called the proximal cause what was the beginning of the chain of events that ultimately led to this unfortunate outcome and the proximate cause here, and you have to be really deferential to the fact that a lot of people died. So it's really hyper trivial to say, well, it was this one thing, but it seems to me that the proximal cause goes back to a series of decisions that were made to say, ultimately, we want this thing to look and feel like something else. And maybe a question should have been, are there unintended consequences that are possible as a result of that decision that we're making right now? There's something that I refer to as the unknown met need. We often talk about known unmet needs, which are the things our customers want that we can't do yet. But the unknown met need is a dangerous thing. It's a way that you're fulfilling the needs of your customers, but maybe you don't realize it. You don't realize they're using your ketchup bottle as a doorstop. Right? You change the design of your product, you change what you do, in a subtle way for some very good reason, in order to meet a known met need and you unintentionally impact an unknown met need.
The unmet needs at Dun and Bradstreet as an area.
It's a huge issue.
Well, it's an issue but it's also an opportunity. You sit down with a client and they tell you they're using your data analysis capability in a use case you didn't know.
You just turned it into a known met need?
Well, yeah, I guess once you know about it it becomes known.
But it's when you don't know about it and you change something without talking to them got it. That's where this, it's not an opportunity cost. It's a win for risk. It's something that happens while you're doing something good. That's bad. And you didn't see it because you didn't realize it. There's an expression that, you know, be careful. If you set out to flood a desert, you may succeed by draining an ocean. There's always consequences. And understanding consequence in these very complex connected environments is a massively challenging thing to do. And it's something that we, as what I'll just say, scientists in air quotes here. As scientists, we're trained in critical thinking, but then life happens, you have to hurry up to get the product out, there's a deadline, the customer wants something there's, there's all these reasons why we stopped we don't necessarily use the critical thinking that at some point in our life we were exposed to, but we forgot to think about it. That's what's ultimately going to get us in the end.
So we're coming towards the end of the podcast. And I want to get some predictions out of you. When you got a man this smart in front of you, you got to try and win the lottery. So – no I’m kidding – we're not going to talk about misuse of your brain. But we are going to talk about predictions around for looking trends. So one of my favorite quotes is if you always do you've always done you'll always get we've always gone. And I think you talked about not having to outrun the bear. You think about the proliferation of a firm like Netflix and the demise of a firm like Blockbuster, which industry do you think is going to undergo the next radical change through digitalization, data and analytics?
I think it's probably one of several. So I'm going to hedge my bets here a little bit.
All things financial payment are under assault right now. There's a combination of sort of commoditization of information arbitrage is much more difficult. It's very difficult to get any kind of competitive advantage with information financially right now when you're sort of moving money around. So I think that as cryptocurrencies as other ways of exchanging value, besides exchanging money, start to come up the science of understanding what's going on there. We, I sort of jokingly referred to crypto econometrics last night, I made up a word, those kinds of things. I think disrupting the way we pay each other is one. I think another one is certainly in the medical field. We have sort of long-standing policies around clinical trials. And now we have the ability to have digital twins of, you know, humans, and to some extent, and maybe we can do some sort of digital trials, right. I'd love to see that because that could accelerate the development of drugs that we desperately need. So one is in like all things payment, the other is in all things medical, and I think the third area that I would pay very strong attention to is all things malfeasant. We are aware of malware. Which is increasingly aware malware. So just a few years ago, all you had to do was and I'm not trying to trivialize this, it's hard, have a bunch of virus signatures understand what all the malware is, when anything comes into your computer, look at it compared to those signatures, have a nice day. And then when something new comes along, somebody gets burned, you build a signature for that, and nobody else will get burned. It doesn't work anymore. I mean, you still need to do that. But now we have malware that works like aids, it's recombinant, it can rewrite itself, it can change the way that it looks. So you won't recognize it when you see it the next time because it looks different. We also have malware that can work like a swarm, it can communicate with other malware agents, learning how it's failing, and trying new ways of getting through your defenses. This is a whole world of hurt that we don't understand yet. And then here comes something like quantum computing or some other disruptive evolution. And now we're going to have quantum attacks. We're going to have quantum encryption, whatever that will be. It's much easier to imagine how we're going to encrypt things with that kind of technology than how we will encrypt them. So the world of all things disruptive or things malfeasant is extremely troubling to me right now. I don't think we're moving fast enough. In that area, the medical one, please move faster. Please do awesome. The malfeasant one, for the love of all you hold dear, please pay attention. And the first one with the payments, it's just predictable because everything always starts with where's the money going?
Yeah, there's a little bit of carrot, a little bit of stick in there.
There’s do good, but be warned. I want to start a petition, maybe this is the call to action. The more calls that Anthony gets, the more likely he is to move to London. Let's set the bar at 10. 10 calls and Anthony's moving to London so I can keep talking to this guy and learning. It's been incredible to have you here, Anthony.
Thank you very much, Sam.
Thank you, safe flight back to New York later on today. And we're going to get you back soon for all those people you're gonna have to meet.
I'll look forward to it. Thank you very much.