What’s in a name?
Whilst predictive analytics is far from new, recently it’s been rejuvenated by the rapid advances in AI and machine learning. As we’re working in the young-again area of predictive analytics in terms of market adoption, the terminology is important. It was the same for the definitions of cloud computing, or big data. It took a painfully long time for market consensus to emerge about what these things were. A delaying factor here is that parties with vested interests (e.g. vendors) have an incentive to define their own terms to suit their objectives (remember the dreaded Enterprise Service Bus!).
When talking to customers, we find they are often not clear on the differences between predictive analytics and classic Management Information (MI) and Business Intelligence (BI). The terms MI and BI are often used interchangeably, and whilst Wikipedia’s definition of BI does include predictive analytics, I would argue that in practice for most organisations BI is generally relatively laggy and focused on looking in the rear-view mirror.
Predictive analytics is defined here as
“a form of advanced analytics that uses both new and historical data to forecast future activity, behavior and trends. It involves applying statistical analysis techniques, analytical queries and automated machine learning algorithms to data sets to create predictive models that place a numerical value, or score, on the likelihood of a particular event happening.”
It’s easy to assume your audience has the same world view and understanding as you do, and the term predictive analytics means the same to me as it does to them. Surely it’s obvious – the clue is in the name? Well…no. So as practitioners we need to make sure that we take the time to establish a shared understanding with customers before proceeding too fast.
Hide and seek
Where this comes out is when we are undertaking discovery work with a client – nailing down the “exam questions” that we want to ask of their data (most likely in combination with other public data sets). We’re on the hunt for the most impactful predictive analytics questions we can answer, such as:
- What are the underlying drivers of my customer conversation rate and so how can I drive it higher?
- Which drivers have the strongest effect?
- What anomalies and outliers are there in my business data that might indicate future fraudulent behaviour?
- Which customers should I prioritise my limited resources on to reduce expensive complaints?
Time and again, we see the pent up demand for better MI/BI coming out in our discovery work. There is a wealth of unmet descriptive reporting demand in most organisations. Managers are making gut-feel or course-grained rather than empirically-based decisions as to how to optimise their operation. We need to take care not to fall into the trap of delivering pseudo-BI requirements via a predictive analytics initiative – not because we can’t, but because this would be compensating for other teams’ resourcing or prioritisation challenges.
Generally, we find that these gut-feel decisions are still pretty good. But commercial competitive advantage is a game of inches, so the extra few percentage points of optimisation that can be squeezed out from statistically grounded predictive models are what organisations are looking for.
So what is the process of using predictive analytics to drive the business opportunities that we decide to target from our work in the discovery phase? At a high level, the process is like this:
- Understand and characterise the data – you don’t know what you don’t know, and I’d be confident to say that there are always surprises in this early phase. Data quality issues emerge and data distributions are different to what was expected. This stage could be described as descriptive analytics (i.e. it’s building static viewpoints on historical data) which is where some confusion with BI can creep in. However it is a mean to an end – to understand the core data sets sufficiently in order to underpin the predictive work that follows. It also helps us build up further domain knowledge enabling us to better understand how we can ask questions of the data.
- Data preparation and cleansing – the hard yards of machine learning!
- Build initial predictive models – there’s lots more to say here, but that’s for other posts. Just one point on this though – of course machine learning and it’s parent topic AI are not magic. Statistical “traditional” model approaches are well understood and therefore demystified, but deep learning techniques do a good impersonation of having magical properties. There is also some debate as to whether some deep learning models are really just “memorising” rather than “learning”. Whilst they can make predictions of future or generalise to circumstances that they have not been trained on, they are still fundamentally trained on historical data. So in this sense, they rely on exactly the same data as descriptive analytics but use it for more sophisticated purposes.
- Infer from those models – strengths of relationships, correlations, further data quality issues, feature importance etc
- Rinse and repeat – iteration is typically necessary, as this is fundamentally a value/insight discovery process, not so much a development process. The outcome is not known in detail at the start.
- Finally – use these models to drive business optimisation (sales, operations etc). This could mean deep integration of machine learning models into business systems such as feeding automatically prioritised sales opportunities into an outbound calling queue. Or it could be as simple as gaining insight into how to modify product pricing for certain customer segments to maximise margin rather than volume.
It’s up to data science practitioners to call out BI requirements that are masquerading as machine learning problems. Not every problem is a machine learning problem, and yet in a world of imperfect traditional enterprise reporting it is seductive to see them as such. By avoiding this trap, we can move faster and deliver more value – with the right teams with the right skills, using the right tools for the job.