Share it with your network!
Help your friends to new knowledge
Here's a snapshot from a few of you: We all want to digitalise, but all we get is another set of huge IT projects. We want to automate, but our people are still manually searching and copy-pasting data back and forth for each important decision. We want to analyse, and we now have a large staff of very highly-paid data scientists with some great proofs of concept, but who are yet to make a real and significant impact on the day-to-day business.
Does that ring a bell? Do you, perhaps, find yourself silently nodding wisely while reflecting that, if 2019 is to be different, then most likely something needs to be done differently?
(If, on the other hand, you honestly believe that all your challenges are behind you: good for you. I genuinely salute you. I'd love to hear more from you. Lunch is on me, for sure.)
Still here? If you care to read on, here is an idea. This is not theory, but a distilled – and quite hard-earned – learning from my team's three years of intense transformation at Dun & Bradstreet. It may or may not be a silver bullet. But for us it was close to one, and one that actually worked. Maybe it could help your team too?
It goes without saying that this all hinges on how relevant Dun & Bradstreet's transformation is to your own. You may not know us, or if you do, you may only know us from a few years ago. But does one or more of the following sound familiar to your own situation?
How about coming from a background of many different companies, not really integrating for a long while, but now truly merging into one? How about a fragmented, complex, legacy tech-base with every conceivable technology from the 70's, 80's and 90's? How about rising customer demands tinged with the perception that, while you may be necessary, you are "somewhat old and slow" vs. promising new challengers?
Having had the benefit of working with some 60 different financial institutions over the last 15 years – from global giants, to Nordic leaders, to rising challengers – it seems like quite a few of you may share some of this context – at least when it comes to transforming.
One thing that we don't share, however, is the bottom line margin. You guys are doing well. Three years ago, we were not – neither financially, nor in terms of progress. It was the toughest of times. Sometimes though, necessity is truly the mother of invention – and a strengthener of resolve. Three years on both financials and transformation are a world apart. We are not finished yet, but progress is good and the future is bright.
Our two-steps-forward-one-step-backward path over the past few years starts in familiar territory.
We all know that digitalisation, and not least AI, machine learning (ML) and deep-learning (DL) – please pick your preferred buzz-word – is all about data. "Data is the new oil" as they say. And there is a lot of data. Quite obviously, the trick is to get the data from all the places where it is, into the digital engines where it can be put to good use. Of course, each engine is very specific and needs a specific fuel mix – just any old data won't do.
For Dun & Bradstreet – a data company at heart – this is all about getting data from sources, via refining and matching, into analytics and via customer-specific logic, and into external APIs, and integrated into client systems and processes.
Five years ago (or ten, or twenty) this was done in many side-by-side "monolithic" systems. It was fine when there wasn't too much data. It was fine when the receiver was typically a human being. However, when things shifted – the need to handle more and more data, changing ever more frequently, with the receiver being a finely tuned and purpose-specific robot – it was no longer fine at all.
Three years ago, we set about building a "pipeline" of data to meet this need. The premise was simple: We would never be able to foresee all the ways that we or our clients would want to use data in the future. Nor could we stop or control the increase in data, its speed of change or its increased regulation. Hence, we needed to enable our clients to bring just the right combination of data and analytics to just where they needed it, without prior warning, and be able to benefit from it right away.
Two years ago, we made the "pipeline" work. It was, however, slow. The data flowed more like asphalt than gasoline. This was despite the fact that we had a great, very recent, technology base with great architecture (we can deploy code into production 30 times or more per day, quite a change from quarterly cycles of old), and great analysts including a new AI/ML/DL-team, and clients eager to reap the benefits.
We clearly weren't alone: It was around this time that it became apparent that the super-expensive data scientists that we all had worked so hard to recruit were spending 70 to 80% of their time getting hold of data, understanding what they got, mapping it and mapping it back. Many of our clients told us, in confidence, that they too were struggling to unleash their own data. I could only nod sympathetically.
At this point we realised that while our pipeline was good at moving data around, enabling access was only part of the challenge. As a data scientist put it at the time: "I no longer need to chase down data. But I still need to manually map it into my models. And then we still need to manually map both the data and the model into production. Because none of these systems actually understand what the data contains and are about".
We realised that this was true not only for our analysts, but for every technology component, every process step and every user.
Armed with this insight, we realised that alongside the (now fairly seamless) flow of data, there is another hidden, manual and ad hoc process which is to transport knowledge about the data. This was the slow part – the re-understanding, re-mapping, and re-integration of data.
This was bad, and as we moved closer to AI it became worse still, as such approaches' learning nature meant we needed to fix this also for the feedback loop, for each decision, in real time. This challenge was at least one key reason why AI models got stuck in proof-of-concept stage.
We tried a number of different approaches ... which all failed. Still, I can proudly say we failed reasonably fast, and learnt a lot. Really though, there is no need for you to repeat our mistakes – so for your benefit:
Without doubt, more items could be added to this list, but instead, I will fast forward one year in time.
Our solution turned out to be an ontology-based data architecture, with explicit and attached metadata, throughout the entire pipeline. Alongside each and every data element, as it travels along our pipeline, now also travels descriptions of what the data is (ontology), how it relates to other data elements, where it comes from, how it has been processed and how and by whom it may be used and stored without becoming uncompliant (all metadata).
If I were a consultant aiming to sell you advice or projects, I would now lapse into an extensive monologue of what that means on a deeper and more technical level (and then push either a software solution or consultants). I'm not, and I won't. I'm sure you have data architects who can explain it.
Instead, let me speak about the impact for us:
First, having data that could now be understood directly where it was to be used, by the people and machines seeking to use it, was a real enabler for us. Together with end-to-end processes and agile development we could create much more complete separation – and increasingly avoid the "huge IT projects" fallacy. It may sound pompous, but we sometimes refer to what we are achieving as the "democratisation of data". The data is now truly available for everyone to use – and use effectively without dependence on others. Yes, the total IT spend is still very significant, but it now comes with much fewer interdependency risks. Wouldn't smaller, more agile IT efforts be a good thing for you as well?
The Dun & Bradstreet equivalent of people searching for and cutting-and-pasting data in the past – the endless search for experts, documents and specifications for our own sake or for our clients' – is also finally coming to an end. No longer do you have to hunt through binders, through Excel sheets, and in internal systems, or finding people. What would be the effect if your transformation troops were no longer held back by access and understanding, but could focus on the job at hand?
For example, if you are building an advanced predictive model, or automating business logic, or creating digital workflows, you can pick and choose the data you need – in full transparency. This means that the mapping, once painstakingly done by hand at every stage, can be done in human readable language once only, and each stage pre- and post can be machine-verified. Once you are done building and deploy your model, you can be certain it has access to the exact right set of data also in production; you can be certain its outputs are available to every potential user of them, and you can be certain that latter stages' "exhaust" data are there too, for your AI model's use. What would be the impact if your data scientists' work were deployable immediately and everywhere? And the AIs' learning would not stop at initial deployment?
In fact, it gets better. It has been said that only about a third of analytical efforts generate real actionable outcomes and hence benefits, with far too many either not solving a critical and real business challenge, or not being transparent enough that the end user dares to trust and act on it directly, or simply use data in a way that is not compliant (this last one obviously a bigger fish now with GDPR). Since the pipeline also lets the metadata flow immediately, the compliance can be directly verified, the sources going into the model are transparent, and the business owner and analyst have common language to describe what they seek. What would happen if were able to ensure business owners and analysts always together framed the challenge end-to-end?
If you need more or different data, that's easy too. After all, using the ontology and metadata, the APIs can, in other ways, be extremely generic. This means that additions and changes in data can typically just "flow through" without any manual handiwork – hence the speed. The time for integrating a new generic data source has gone from months to as little as days. What new data would you love to use to drive your business if you could just get it to where it's needed?
If you seek to leverage human intelligence through digitisation – by having automation do the "simple" jobs so your people can focus on the very complex ones – I have an inspiring learning too. If the human beings picking up the complex case are credibly informed about what data has already been collected, where it comes from and how it has been evaluated, they are much more likely to pick up where the machine left off, instead of re-starting with manual data collection and verification. What would be the impact if all your people (for instance in know-your-customer processes) could focus their efforts where it really makes a difference, rather than on data collection?
These concepts and ideas are not new. Certainly, the hope and promises are not. Maybe it will not be everything you need. But the idea did work in practice and made a huge difference in progress and speed. If you are interested, I'd love to hear from you. If you try it, I'd love to hear your success stories, but equally any learnings. Together we'll get even better.
2019 is likely to be another year of transformation. Let's not forget that it's now more than 30 years since Robert Solow's famous quip "You can see the computer age everywhere but in the productivity statistics". Nor let us forget that it's almost as many years since Paul Krugman remarked that "Productivity isn't everything, but in the long run it is almost everything."
To me, there's enough proof that digitalisation (in all its flavours) can finally prove Mr Solow wrong, and make Mr Krugman happy. Let's make it happen in 2019, shall we?